China's Qwen model tops 'Open LLM Leaderboard v2' testing Hugging Face AI models



Hugging Face has released version 2 of the Open LLM Leaderboard, which ranks open source language models from around the world. At the time of release, the top ranking was

Qwen2-72B-Instruct , developed by Alibaba.

Open-LLM performances are plateauing, let's make the leaderboard steep again - a Hugging Face Space by open-llm-leaderboard
https://huggingface.co/spaces/open-llm-leaderboard/blog

Chinese AI models storm Hugging Face's LLM chatbot benchmark leaderboard — Alibaba runs the board as major US competitors have worsened | Tom's Hardware
https://www.tomshardware.com/tech-industry/artificial-intelligence/chinese-llms-storm-hugging-faces-chatbot-benchmark-leaderboard-alibaba-runs-the-board-as-major-us-competitors-have-worsened

To rank them, the language models were assessed on four tasks: intelligence, short and long context reasoning, complex mathematics ability, and how well they followed human instructions.

The evaluation used six benchmarks: the multiple-choice benchmark ' MMLU-Pro ,' the ' GPQA ' which measures highly specialized knowledge, the ' MuSR ' which includes questions such as solving murder mysteries, the mathematics aptitude test ' MATH ,' the ' IFEval ' which tests the ability to follow instructions, and the ' BBH ' which measures whether the user can produce answers that are interesting to humans.

Over 7,500 models were evaluated, and the winner was the 'Qwen2-72B-Instruct'. Hugging Face said, 'Qwen2-72B-Instruct is head and shoulders above the others.' In fact, Qwen2-72B-Instruct was the only model to reach an average rating of 40 points.

The results can be found at the link below.

Open LLM Leaderboard 2 - a Hugging Face Space by open-llm-leaderboard
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard



This ranking is subject to change. At the time of writing, the breakdown from 1st to 10th place is as follows:

1st place: Qwen/ Qwen2-72B-Instruct
2nd place: meta-llama/ Meta-Llama-3-70B-Instruct
3rd place: Qwen/ Qwen2-72B
4th place: mistralai/ Mixtral-8x22B-Instruct-v0.1
5th place: HuggingFaceH4/ zephyr-orpo-141b-A35b-v0.1
6th place: Microsoft/ Phi-3-medium-4k-instruct
7th place: 01-ai/ Yi-1.5-34B-Chat
8th place: CohereForAI/ c4ai-command-r-plus
9th place: abacusai/ Smaug-72B-v0.1
10th place: Qwen/ Qwen1.5-110B

As shown above, Qwen's models occupy three of the top 10, showing their overwhelming strength. In addition, 'Smaug-72B', which ranked 9th this time, was the top of Open LLM Leaderboard version 1 as of February 2024. Smaug-72B is a model created by fine-tuning 'Qwen-72B', which ranked 3rd this time.

Abacus AI's open source LLM 'Smaug-72B' topped Hugging Face's Open LLM Leaderboard and outperformed GPT-3.5 in several benchmarks - GIGAZINE



in Posted by log1p_kr