'Open Japanese LLM Leaderboard' released to evaluate the performance of large-scale language models used in Japanese AI



The 'Open Japanese LLM Leaderboard' has been released, which evaluates and analyzes the performance of Japanese large-scale language models (LLMs) using over 16 types of NLP (natural language processing) tasks. It was built by the cross-organizational project '

LLM-jp ,' which conducts research and development on Japanese LLMs, including the National Institute of Informatics.

Open Japanese LLM Leaderboard - a Hugging Face Space by llm-jp
https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard

Introducing the Open Leaderboard for Japanese LLMs!
https://huggingface.co/blog/leaderboard-japanese

Open Japanese LLM Leaderboard Released - LLM Study Group
https://llm-jp.github.io/llm/2024/11/20/open-japanese-llm-leaderboard.html

While LLMs are widely available in English, it has been difficult to know how well they work in other languages. This 'Open Japanese LLM Leaderboard' uses ' llm-jp-eval ,' an automated evaluation tool for Japanese LLMs, to evaluate the performance of LLMs.

llm-jp-eval: An automated evaluation tool for large-scale Japanese language models
(PDF file) https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/A8-2.pdf



The supported evaluation datasets are as follows:

・Natural Language Inference (NLI): Jamp, JaNLI, JNLI, JSeM, JSICK
・Question Answering (QA): JEMHopQA, NIILC
・Reading Comprehension (RC): JSQuAD
・Multiple Choice question answering (MC): JCommonsenseQA
・Entity Linking (EL): chABSA
・Fundamental Analysis (FA): Wikipedia Annotated Corpus
・Mathematical Reasoning (MR): MAWPS
・Semantic Textual Similarity (STS): JSTS
・Machine Translation (MT): ALT, WikiCorpus
・Exam questions (HE): MMLU, JMMLU
・Code generation (CG): MBPP
・Summary (SUM): XL-Sum

Japanese has a very complex writing system that mixes four writing systems: Hiragana, Katakana, Kanji, and Roman letters. Furthermore, it does not use spaces between words, which makes tokenization difficult.

Nevertheless, Japanese language LLMs incorporating the characteristics of Japanese natural language processing have been developed, but the problem is that there has been no centralized, open system for comparing LLMs.

To increase transparency in research and encourage open-source model development, Hugging Face and llm-jp have collaborated to create an open Japanese leaderboard.

This initiative is expected to become a platform for evaluating and strengthening the Japanese LLM through collaboration between researchers both in Japan and overseas.

in AI,   Web Service, Posted by logc_nt