Alibaba Announces Qwen3-Max-Thinking, an Inference AI Model with Performance Equivalent to GPT-5.2

A new flagship inference model, ' Qwen3-Max-Thinking, ' has been added to the 'Qwen' series of open source large-scale language models developed by Chinese IT giant Alibaba.
Pushing Qwen3-Max-Thinking Beyond its Limits

According to the Qwen team, 'Qwen3-Max-Thinking' expands model parameters and leverages massive computational resources for reinforcement learning, resulting in significant performance improvements in multiple areas, including fact-based knowledge, complex reasoning, instruction following, consistency with human preferences, and agent functionality.
The table below shows the benchmark scores for five models: GPT-5.2-Thinking, Claude-Opus-4.5, Gemini 3 Pro, DeepSeek V3.2, and Qwen3-Max-Thinking. Qwen3-Max-Thinking achieved top scores in the Chinese language assessment test C-Eval , the mathematical reasoning benchmark HMMT 25 (November 2025 edition), the HLE (Humanity's Last Test) , which covers a wide range of subjects, and Arena Hard v2 , and also achieved scores comparable to the four models in other tests.
| Benchmark Test | GPT-5.2 -Thinking | Claude-Opus -4.5 | Gemini 3 Pro | DeepSeek V3.2 | Qwen3-Max -Thinking | |
|---|---|---|---|---|---|---|
| knowledge | MMLU-Pro | 87.4 | 89.5 | 89.8 | 85.0 | 85.7 |
| MMLU-Redux | 95.0 | 95.6 | 95.9 | 94.5 | 92.8 | |
| C-Eval | 90.5 | 92.2 | 93.4 | 92.9 | 93.7 | |
| STEM (Science, Technology, Engineering, Mathematics) | GPQA | 92.4 | 87.0 | 91.9 | 82.4 | 87.4 |
| HLE | 35.5 | 30.8 | 37.5 | 25.1 | 30.2 | |
| inference | LiveCodeBench v6 | 87.7 | 84.8 | 90.7 | 80.8 | 85.9 |
| HMMT Feb 25 | 99.4 | - | 97.5 | 92.5 | 98.0 | |
| HMMT Nov 25 | - | - | 93.3 | 90.2 | 94.7 | |
| IMOAnswerBench | 86.3 | 84.0 | 83.3 | 78.3 | 83.9 | |
| Agentic Coding | SWE Verified | 80.0 | 80.9 | 76.2 | 73.1 | 75.3 |
| Agent Search | HLE (with tools) | 45.5 | 43.2 | 45.8 | 40.8 | 49.8 |
| Follow-up and consistency | IFBench | 75.4 | 58.0 | 70.4 | 60.7 | 70.9 |
| MultiChallenge | 57.9 | 54.2 | 64.2 | 47.3 | 63.3 | |
| Arena-Hard v2 | 80.6 | 76.7 | 81.7 | 66.5 | 90.2 | |
| Tool execution | Tau² Bench | 80.9 | 85.7 | 85.4 | 80.3 | 82.1 |
| BFCL-V4 | 63.1 | 77.5 | 72.5 | 61.2 | 67.7 | |
| Vita Bench | 38.2 | 56.3 | 51.6 | 44.1 | 40.9 | |
| Deep Planning | 44.6 | 33.9 | 23.3 | 21.6 | 28.7 | |
| Long Context | AA-LCR | 72.7 | 74.0 | 70.7 | 65.0 | 68.7 |
Qwen3-Max-Thinking is now available on ' Qwen Chat ,' and if you have an Alibaba Cloud account and activate the Alibaba Cloud Model Studio service, you can create an API key for 'Qwen3-Max-Thinking' (qwen3-max-2026-01-23).
Related Posts:
in AI, Posted by logc_nt







