Phoneme-Based N-gram Language Identification
(2022) For this Python-based project, I built an n-gram-based language identification model to predict the language of IPA-transcribed utterances. This involved pre-processing training data for 11 languages (tedious but necessary and diversely insightful), programming functionality for automatic text-to-IPA transcription (truly a fun-ction ☺), and analyzing the phoneme frequency distributions across languages (an avenue of interest I would like to return to!).
More on this project can be viewed here: [phoneme-based-ngram on GitHub].