Machine Translation for Historical Research: A case study of Aramaic-Ancient Hebrew Translations

Tracking #: 2763-3977

This paper is currently under review
Shmuel Liebeskind
Chaya Liebeskind
Dan Bouhnik

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
In this article, we investigate Machine Translation (MT) in a cultural heritage domain for two primary purposes: evaluating the quality of ancient translations and preserving Aramaic (an endangered language) by the ability to translate it to another spoken language. First, we detail the construction of publicly available Biblical parallel Aramaic-Hebrew corpus based on two ancient (early 2nd- late 4th century) Hebrew–Aramaic translations: Targum Onkelus and Targum Jonathan. Then using the Statistical Machine Translation (SMT) approach, which significantly outperforms the Neural Machine Translation (NMT) in our use case, validate the excepted high quality of the translations. The trained model fails to translate Aramaic texts of other dialects. However, when we train the same SMT model on another Aramaic-Hebrew corpus of a different dialect (Zohar - 13th century) a very high translation score is achieved. We examine an additional important cultural heritage source of Aramaic texts, the Babylonian Talmud (early 3rd- late 5th century). Since we do not have parallel corpus of the Talmud, we use the model trained on the Bible corpus for translation. We performed an analysis of the results and suggest some potential promising future research.
Full PDF Version: 
Under Review