n-grams

This program uses diphone (bigram) and triphone (trigram) models to represent Mainstream American English (MAE) phonotactics.

Output:

Generate random sequences of phonemes based on training data (CMU Pronouncing Dictionary)
When given test file of made-up "words" as argument, model scores each "word" based on similarity to MAE phonotactics

Use:

The program takes either two or three command line arguments:

training_file
- CMU phonetic dictionary
N (either 2 or 3)
- sets n-gram type (bigram or trigram)
test_file (optional)
- corpus for which to calculate perplexity
  - X.txt or Y.txt

Notes:

If the program is given only two command line arguments, it prints 25 "words" consisting of random phoneme sequences based on either a diphone or triphone model.

When given three arguments, the program processes the test file based on a smoothed di- or triphone model. The probability of each test "word" is calculated and printed. The perplexity of the test corpus is calculated based on the log probabilities of its constituent words.

ngram-TTS.py will only take two arguments, but it reads aloud each randomly generated phoneme sequence / word using the pyttsx3 library.

Known Issues:

triphone counts not accurate due to misapplied boundary symbols (# #)
N in perplexity calculation is defined as len(phonemes), but it should be len(phonemes)-1

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md
X.txt		X.txt
Y.txt		Y.txt
min_edit.py		min_edit.py
ngram-TTS.py		ngram-TTS.py
ngram.py		ngram.py
word_transcriptions.txt		word_transcriptions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

X.txt

X.txt

Y.txt

Y.txt

min_edit.py

min_edit.py

ngram-TTS.py

ngram-TTS.py

ngram.py

ngram.py

word_transcriptions.txt

word_transcriptions.txt

Repository files navigation

n-grams

Output:

Use:

Notes:

Known Issues:

About

Releases

Packages

Languages

License

syldekker/n-gram-modeling

Folders and files

Latest commit

History

Repository files navigation

n-grams

Output:

Use:

Notes:

Known Issues:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages