Skip to content

fayrose/EgyptianTranslation

Repository files navigation

Machine Translation for Middle Egyptian-English

This repo preprocesses various corpora of Middle Egyptian transliterations to an identical format, then uses supervised and semi-supervised learning techniques with OpenNMT for machine translation. Afterwards, the results are quantified using token-accuracy, perplexity, cross-entropy and BLEU score.

Supervised case:
Corpus size: 12,938 aligned sentences
Current max BLEU score = 42.22

Semi-supervised case:
Corpus size: 50,457 monolingual sentences + 12,938 aligned sentences
Current max BLEU score = 41.78

In-progress:

  • Parse pyramid texts from PDF to add additional ~5k aligned sentences
  • Preprocess newly added aligned sentences
  • Update machine translation notebook with new BLEU score after corpus expanded
  • Semi-supervised machine translation pipeline

About

Supervised and semi-supervised machine translation for Middle Egyptian using OpenNMT.

Topics

Resources

Stars

Watchers

Forks