Hacking Tigrinya
To help Eritrean refugees overcome the language barrier, we used machine learning to train a translation machine that automatically converts sentences from Tigrinya to English.
What do you do if a language does not appear in Google Translate? Then you build a translation machine yourself! Together with Travis Foundation, our non-profit venture Hack The Planet developed a machine learning model for translating Tigrinya into English and vice versa.
Helping refugees with an online translation machine
Every year thousands of Eritreans flee the many armed conflicts in East Africa. When they try to rebuild their lives in a new country, they run into a language barrier that is not resolved by translation machines. The language of people from Eritrea, Tigrinya, is spoken by some eight million people worldwide, but it has not been digitized. For commercial tech parties such as Google and Microsoft, the language is too small to include in their translate services. Travis Foundation, a Dutch NGO, therefore decided to digitize Tigrinya themselves and create an online translation machine for Eritrean refugees.
The first steps
At first, Travis Foundation tried to use Eritreans to translate sentences into English, but that was too slow. And you don't get a translation machine in this way. That's why our venture Hack The Planet stepped in. By applying machine learning to a large amount of language data, we created a model that could be used to automatically predict a good text-to-text translation. This approach is also referred to as neural machine translation.
Training the machine learning model
We started to train a model in such a way that it could convert new sentences from Tigrinya into English and vice versa. As a basis for this we needed a large language corpus of sentence pairs, containing exactly the same sentences in the source and target language. You can go a long way with the Bible. Although the Bible is not a very large corpus and contains a lot of unusual language, Bible texts in different languages share the same structure. This is also the case for Tigrinya and English. By cutting up these texts and applying machine learning to them, we were able to train our model. In this technical article on our Engineering Blog, Q'ers Jaap and Leonard explain how successful that was.
Result and next steps
Unfortunately, we have not been able to further train our machine learning model to generate better translations. Funding for the Travis Foundation came to an end and the development of the translation machine came to a halt for the time being. The project has been handed over to the non-profit organization Translators without Borders, but the goal has remained the same: to help Eritrean refugees with their integration by reducing the language gap.