It has been reported that when ChatGPT handles 'non-major languages' that are not often used on the Internet, the response level drops considerably due to translation errors, fabricated words, illogical answers, etc.



ChatGPT , OpenAI's conversational AI, has been reported to have the ability to answer with high accuracy, such as recording passing scores on Google's coding exam , law school exam , and passing the medical license exam. I am. On the other hand, when communicating with ChatGPT in a language other than English, there are many cases where people feel that their intentions are not being conveyed properly. In particular, some languages that are not commonly viewed online have been reported to be significantly less accurate, with users failing logic tests and being unable to find basic information.

ChatGPT fails in languages like Tamil and Bengali - Rest of World
https://restofworld.org/2023/chatgpt-problems-global-language-testing/



ChatGPT works well in major languages such as English and Spanish, and can respond successfully to some extent in Japanese, but it does not work well in minor online languages such as Bengali, Swahili, Urdu, and Thai. will have a hard time producing text of the same quality. In fact, when the technology media Rest of World tested ChatGPT's ability to respond in these non-major languages, it found that it far exceeded the level of translation errors, including fabricating non-existent words, illogical answers, and responses that were completely nonsensical. It seems that the problem has been confirmed.

For example, Tigrinya, one of Ethiopia's official languages with more than 7 million speakers, shares a similar script with Amharic, the more dominant language in Ethiopia, but Tigrinya and Amharic are different. are distinct and have significant differences. However, Rest of World reports that ChatGPT confuses the two languages and produces sentences that are difficult to read for native speakers of either. Also, when asked to ``give examples of African countries,'' in English, 10 African countries were listed, but when asked in Tigrinya, ``Canada,'' ``Jordan,'' and other countries outside the African continent were cited. Some even name countries that don't exist at all.



When Rest of World asked experts about this issue, AI researchers said, ``Proper nouns such as nouns, places, and institutions are an absolute weakness of ChatGPT. It's a common language problem.' Among AI researchers, these languages are referred to as 'low resource,' meaning that even though they are spoken by many people around the world, there is little online notation, so it is difficult to create a language tailored to that language. The problem is that the model is not trained enough. As a result, under low resource and undertrained conditions, ChatGPT has often been seen to produce unintelligible responses.

For the same problem, researchers at the University of Oregon conducted

a study in which they tasked ChatGPT with multiple writing tasks in 37 different languages and compared the quality of the responses. As a result, ChatGPT's performance is poor for languages with relatively few resources, and the study notes that 'the amount of training data is clearly a factor, but in addition to that, for languages that are structurally different from English... We found that ChatGPT was more difficult.'

The Haitian language, which is used in countries such as Haiti in the West Indies of Central America, belongs to the French family, but it has its own grammatical rules and is unique, with many words that sound similar but have different meanings and spellings. It has the following characteristics. However, ChatGPT tends not to understand the characteristics of Haitian, such as confusing Haitian with French, using incorrect spellings, and using words that are only used in French. Laura Wagner, who is in charge of the Haitian language at Respond Crisis Translation, an interpretation and translation service for immigrants and refugees, said that after reviewing the Haitian sentences generated by ChatGPT, she found ``syntactic errors and incorrect French expressions.'' Most importantly, ChatGPT's Haitian texts are full of non-existent words.'

Additionally, significant results have been reported in literary texts such as poetry. Spoken as one of the official languages in southern India, Sri Lanka, Singapore, and other countries, Tamil has over 78 million speakers and is said to have a rich literary history. In Tamil, there is a style of rhythmic poetry called ``Venpa'', but when I asked ChatGPT to ``write a poem using Venpa,'' in English I was able to create a poem with a well-developed rhythm. , Venpa's parent language, Tamil, failed to generate poems to the extent that the structure was incorrect and included phrases that were not sentences. Sankar, an AI developer and creator of the Tamil version of Wordle, responded to a review request from Rest of World and said, ``If I were to rate this poem like a Tamil teacher, I would give ChatGPT a zero.'' That's probably going to happen.'



Thien Nguyen, an assistant professor at the University of Oregon, said, ``What low-resource languages inherently lack is the ability to perform more semantic inferences and complex inferential skills. 'Along with questions and answers, we know that they have difficulty summarizing sentences, identifying proper nouns, and making common sense deductions.' 'These problems are rooted in scraping data from the English-dominated Internet,' said Sourojit Ghosh, a doctoral candidate in human-centered design and engineering at the University of Washington. So, while OpenAI cannot be held solely responsible, OpenAI nevertheless delivers on its promise to close these data disparities, allow access for multilingual users, and ensure that ChatGPT can successfully perform language translation tasks. That is the minimum requirement.'

in Software, Posted by log1e_dh