Open source project 'Masakhane' that enables machine translation of over 2000 African languages


World Bank Photo Collection

Languages used in the African continent are not only those that are generally known, such as English, French, and Arabic dialects, but also languages that have been used by African tribes since ancient times. According to one theory, there are over 2,000 languages on the African continent, and being divided into various languages can be an obstacle to communication and commerce. In 2019, African AI researchers and engineers created an open source project called “ Masakane ” and started “a magnificent attempt to translate African languages using machine translation”.


The Masakhane project wants machine translation and AI to transform Africa | VentureBeat

Masakhane was founded by South African AI researchers Jade Abbott and Laura Martinus, and the project is collaborating with AI researchers and data scientists across Africa. When they met at a machine learning and

natural language processing (NLP) conference in 2019, they talked about a project to translate African languages into machine learning models and started Masakhane. The name of the project “Masakhane” is a word that means “to make together” in Zulu .

Languages that enable machine translation in Masakhane include not only the various native languages of Africa, but also Nigerian Pidgin English and Arabic dialects spoken in North and Central Africa. Unlike European languages, these languages do not have specific benchmarks or large datasets. So at the time of writing, Masakhane will start by working with groups such as Translators without Borders and linguists to create a language dataset.

They believe that if various African languages become machine-translatable, they can be extended to various open source projects that benefit Africans. The following image is a map showing the number of developers participating in Masakhane at the time of writing the article in green. It seems that there are about 60 developers across Africa at the time of writing, especially South Africa, Kenya and Nigeria. Each participant collects data in their native language and trains the model.


Ruhiya group Kathleen Siminyu who is the origin of the engineers, when talking with people of the same tribe Luhya language we use. In Kenya, English is often used in public places such as schools, but in everyday life, different languages are used for each tribe, so Mr. Siminyu felt that there was a communication gap. It was. Therefore, AI developer Mr. Siminyu has decided to join Masakhane.

Siminyu believes that machine learning is used to translate African languages, which will lead to the growth of AI utilization in Africa, and will promote the movement of African people to use AI in their lives. Mr. Siminyu argues that projects across the continent, such as Masakhane, are important to connect African developers and researchers' communities and to achieve sustainable and long-term collaboration.

“Language differences are a barrier, and eliminating the language barrier will allow many Africans to engage in the digital economy and ultimately the AI economy. “I feel that it is the responsibility of those who participate in Masakhane to join non-people in the AI society,” said Siminyu.



According to a report released in 2019 by GitHub , African countries such as Kenya and Nigeria have shown a significant increase in contributors to open source projects. Africa's technology and developer ecosystem is also attractive for Silicon Valley companies, with Twitter's CEO Jack Dorsey and GitHub's CEO Nat Friedman visiting Africa one after another.

Masakhane attendees say that the developer community in Africa is expanding rapidly and the benefits of machine translation for African languages are significant. “We can solve the problem. We have experts, we have knowledge and intelligence,” says Masakhane, an African developer, who speaks Yoruba in Nigeria. I think they will become a foothold for contributing to the world. Multiple (PDF) studies show that teaching in their native language leads to efficient learning, and Masakhane translates English literature into African languages to cultivate more people There is a possibility that it will lead to.

Espoir Murhabazi, the developer of the Democratic Republic of the Congo , who is in charge of Lingara language , pointed out that Lingara language is different from other languages, and that one word consists of 'stem + multiple elements that make up the meaning of the language'. Each language that Masakhane aims to translate has many technical issues, such as structural differences between the languages. Still, Murhabazi expects language machine translation systems to contribute to people's entertainment. “At nightclubs and bars we went to when we lived in Kenya, not everyone who danced understood the meaning of the song,” said Murhabazi. He said that it might be able to understand and enjoy the meaning of.

by 691806

In addition to the significance of providing various opportunities for African people, the benefits of developers participating in Masakhane are listed as follows: `` The success of AI projects by Africans is a restriction on AI researchers in Africa May lead to relaxation. '

At the time of writing, most major AI-related conferences are held in Europe, Asia, and North America, and African researchers have limited opportunities to interact with other researchers. In addition, even for Africans educated in Western countries, government agencies and others have refused to accept African AI researchers.

In December 2019, NeuroIPS , an international AI conference, will be held in Vancouver, Canada, but it is reported that researchers in Africa and Asia may be denied visa issuance by the Canadian government . In order to remove this prejudice against African developers, it is important to make African AI projects successful.

In addition, Abbott pointed out that African developers who participated in Masakhane exchanged a variety of knowledge and brought great stimulus and progress to each other's development. “Meeting a community that operates in a resource-poor language is a major boost for our research,” Abbott said.

by World Bank Photo Collection

in Software, Posted by log1h_ik