Facebook develops AI technology that dramatically improves the quality of machine translation


by MILKOVÍ

In 2016, Google Translate dramatically improved the quality of translation by introducing a system called "Neural Machine Translation / NMT". However, Facebook has developed AI technology that does not require training data as existing systems have the weakness of requiring "training data" created by humans. This means that translations of minor languages ​​that have not been translated well so far will dramatically improve.

Phrase-Based & Neural Unsupervised Machine Translation
(PDF file) https://arxiv.org/pdf/1804.07755.pdf

Unsupervised machine translation: A novel approach to providing fast, accurate translations for more languages ​​- Facebook Code
https://code.fb.com/ai-research/unsupervised-machine-translation-a-novel-approach-to-provide-fast-accurate-translations-for-more-languages/

Facebook's AI Just Set A New Record In Translation And Why It Matters
https://www.forbes.com/sites/williamfalcon/2018/09/01/facebook-ai-just-set-a-new-record-in-translation-and-why-it-matters/#4616ca493124



In 2015 AI technology was developed to enable machine translation by the Canadian research institution Montreal Institute for Learning Algorithms (MILA) (PDF file) . MILA's Neural Machine Translation (NMT), which is also used for Google translation, does not translate sentences by phrase, but by translating all sentences at once, considering the meaning of words that change according to the context can. With the neural machine translation, the quality of Google translation has dramatically improved.

However, neural machine translation required a pair of sentences in two languages ​​to translate. Two kinds of "I like to eat (English)" and "me gusta comer (Spanish)" are necessary to make a translation between English and Spanish, and such pairs are not enough, and between English and Urdu Translation etc do not work well. In order to improve translation accuracy, researchers focused on developing systems that do not require such pairs.


by Simson Petrol

In August 2018, researchers at Facebook AI Research (FAIR) announced that they dramatically improved the translation in languages ​​with few pairs, such as between Urdu and English.

In BLEU (Bilingual Evaluation Understudy) which is one of the automatic evaluation criteria of machine translation results, 1 BLEU point is evaluated as "remarkable achievement", whereas Facebook's new technology has improved more than 10 BLEU points.

When making AI machine learning, it is necessary to learn the prepared data (Labeled data) in advance. The creation of this training data has so far required human beings to do it manually, requiring enormous labor. Facebook's new technology does not require the creation of training data and can judge whether the cat is reflected in the picture is a cat without the training data labeled "cat", for example. This technology is expected to be able to translate documents already written in past languages ​​or to translate less used languages ​​like Swahili in real time.

The core of Facebook's new technology is a combination of the following three. Both are developed in past research.

1: Byte pair coding
In Facebook technology, the word "hello" is not given to the whole system, but hello is divided into four parts "he", "l" "l" "o", and given to the system. This makes it possible to translate "he" without knowing the word "he". By dividing words into shorter units, you can effectively eliminate unknown words.

2: Language model - The language model is a formalized form of the part of speech and syntactic structure of minutes, words and words, and relations between documents and documents. This allows you to judge more natural sentences and you can modify "how is you" to "how are you".

3: Reverse translation - When the user tries to translate from English to Spanish, the system performs reverse translation from Spanish to English. This will increase the amount of data, it will be possible to optimize the neural translation model.

The Facebook system combines the above three elements with a neural network-based system (NMT) and a phrase-based system (PBSMT) approach. Both NMT and PBSMT were able to raise the quality of translation alone, and it was possible to produce very good results by using both simultaneously.

In addition, Facebook has released code free of charge, and anyone can build a system.

GitHub - facebookresearch / UnsupervisedMT: Phrase-Based & Neural Unsupervised Machine Translation

in Software,   Web Service, Posted by darkhorse_log