Jan 27, 2026 10:57:00

Alibaba Announces Qwen3-Max-Thinking, an Inference AI Model with Performance Equivalent to GPT-5.2

A new flagship inference model, ' Qwen3-Max-Thinking, ' has been added to the 'Qwen' series of open source large-scale language models developed by Chinese IT giant Alibaba.

Pushing Qwen3-Max-Thinking Beyond its Limits

https://qwen.ai/blog?id=qwen3-max-thinking

According to the Qwen team, 'Qwen3-Max-Thinking' expands model parameters and leverages massive computational resources for reinforcement learning, resulting in significant performance improvements in multiple areas, including fact-based knowledge, complex reasoning, instruction following, consistency with human preferences, and agent functionality.

The table below shows the benchmark scores for five models: GPT-5.2-Thinking, Claude-Opus-4.5, Gemini 3 Pro, DeepSeek V3.2, and Qwen3-Max-Thinking. Qwen3-Max-Thinking achieved top scores in the Chinese language assessment test C-Eval , the mathematical reasoning benchmark HMMT 25 (November 2025 edition), the HLE (Humanity's Last Test) , which covers a wide range of subjects, and Arena Hard v2 , and also achieved scores comparable to the four models in other tests.

	Benchmark Test	GPT-5.2 -Thinking	Claude-Opus -4.5	Gemini 3 Pro	DeepSeek V3.2	Qwen3-Max -Thinking
knowledge	MMLU-Pro	87.4	89.5	89.8	85.0	85.7
	MMLU-Redux	95.0	95.6	95.9	94.5	92.8
	C-Eval	90.5	92.2	93.4	92.9	93.7
STEM (Science, Technology, Engineering, Mathematics)	GPQA	92.4	87.0	91.9	82.4	87.4
STEM (Science, Technology, Engineering, Mathematics)	HLE	35.5	30.8	37.5	25.1	30.2
inference	LiveCodeBench v6	87.7	84.8	90.7	80.8	85.9
	HMMT Feb 25	99.4	-	97.5	92.5	98.0
	HMMT Nov 25	-	-	93.3	90.2	94.7
	IMOAnswerBench	86.3	84.0	83.3	78.3	83.9
Agentic Coding	SWE Verified	80.0	80.9	76.2	73.1	75.3
Agent Search	HLE (with tools)	45.5	43.2	45.8	40.8	49.8
Follow-up and consistency	IFBench	75.4	58.0	70.4	60.7	70.9
	MultiChallenge	57.9	54.2	64.2	47.3	63.3
	Arena-Hard v2	80.6	76.7	81.7	66.5	90.2
Tool execution	Tau² Bench	80.9	85.7	85.4	80.3	82.1
	BFCL-V4	63.1	77.5	72.5	61.2	67.7
	Vita Bench	38.2	56.3	51.6	44.1	40.9
	Deep Planning	44.6	33.9	23.3	21.6	28.7
Long Context	AA-LCR	72.7	74.0	70.7	65.0	68.7

Qwen3-Max-Thinking is now available on ' Qwen Chat ,' and if you have an Alibaba Cloud account and activate the Alibaba Cloud Model Studio service, you can create an API key for 'Qwen3-Max-Thinking' (qwen3-max-2026-01-23).

Related Posts:

Jan 27, 2026 10:57:00 in AI, Posted by logc_nt