Qwen3 235B A22B | BenchmarkList

Metadata

Qwen Open source

Aliases: qwen-qwen3-235b-a22b, qwen-qwen3-235b-a22b-04-28, qwen/qwen3-235b-a22b, qwen/qwen3-235b-a22b-04-28, qwen3-235b-a22b, qwen3-235b-a22b-04-28

Benchmark	Category	Rank	Score	Sampled
ADBench	Agentic	8	68	2026-05-06
EnterpriseOps-Gym	Agentic	22	15.8%	2026-05-05
MultiChallenge	Agentic	27	41.22	2026-05-06
Tau2-Bench Telecom	Agentic	262	27.2%	2026-05-11
Tau2-Bench Telecom	Agentic	288	24%	2026-05-11
Terminal-Bench Hard	Agentic	257	6.1%	2026-05-11
Terminal-Bench Hard	Agentic	258	6.1%	2026-05-11
Stick To Your Role!	Alignment	12	0.72	2026-05-06
IOI	Coding	54	0%	2026-05-26
LiveCodeBench	Coding	13	65.90	2026-05-06
LiveCodeBench	Coding	60	70.62%	2026-05-28
MultiPL-E	Coding	11	0.6594	2026-05-27
SciCode	Coding	100	39.9%	2026-05-11
SciCode	Coding	250	29.9%	2026-05-11
TuRTLe Code Completion (Icarus Verilog)	Coding	11	67.54	2026-05-06
TuRTLe Code Completion (Verilator)	Coding	11	66.80	2026-05-06
TuRTLe Line Completion	Coding	1	41.94	2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog)	Coding	8	69.16	2026-05-06
TuRTLe Spec-to-RTL (Verilator)	Coding	8	69.17	2026-05-06
NeoEvalPlusN	Creative	135	10	2026-05-06
NeoEvalPlusN	Creative	145	9.25	2026-05-06
EduGuardBench	Education	12	0.67	2026-05-27
AI Energy Score	Efficiency	101	5	2026-05-06
AI Energy Score	Efficiency	141	4	2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard	Factuality	11	95.88	2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard	Factuality	12	95.83	2026-05-06
Vectara HHEM Hallucination Leaderboard	Factuality	45	90.70	2026-05-06
Fin-RATE	Finance	4	24.39%	2026-05-28
TaxEval v2	Finance	62	70.646%	2026-05-28
MageBench Season 1	Game	18	1594 rating / 11 games	2026-05-28
BenchLM	General Knowledge	68	47	2026-05-06
BenchLM	General Knowledge	87	33	2026-05-06
MMLU-Redux	General Knowledge	28	0.87	2026-05-06
Arena-Hard	Generalization	9	58.4%	2026-05-27
WeirdML	Generalization	14	41.04	2026-05-06
HealthBench Hard	Healthcare	8	0.5	2026-05-27
MedQA	Healthcare	45	90.617%	2026-04-16
Artificial Analysis Intelligence Index	Intelligence	244	19.79	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	286	16.96	2026-05-11
GPQA Diamond	Intelligence	66	70.202%	2026-05-28
Humanity's Last Exam	Intelligence	129	11.7%	2026-05-11
Humanity's Last Exam	Intelligence	340	4.7%	2026-05-11
MMLU Pro	Intelligence	54	81.246%	2026-05-28
MMLU-Pro	Intelligence	65	82.8%	2026-05-11
MMLU-Pro	Intelligence	164	76.2%	2026-05-11
LAMBADA	Language	5	71.10	2026-05-06
PIQA	Language	15	79.90	2026-05-06
LegalBench	Legal	58	80.179%	2026-05-28
LEXam	Legal	18	47.25% open / 48.19% MCQ	2026-05-28
ConStory-Bench	Long Context	23	CED 1.447	2026-05-28
Fiction.LiveBench	Long Context	5	68.80	2026-05-06
Fiction.LiveBench	Long Context	15	44.40	2026-05-06
AIME	Math	40	83.958%	2026-04-16
AIME 2025	Math	61	82%	2026-05-11
AIME 2025	Math	202	23.7%	2026-05-11
IneqMath	Math	26	6	2026-05-06
MATH 500	Math	8	94.6%	2026-01-09
MGSM	Math	17	92.473%	2026-01-09
FrontierMath 2025-02-28 Private	Mathematics	12	8.48	2026-05-06
FrontierMath Tier 4 2025-07-01 Private	Mathematics	13	0	2026-05-06
OTIS Mock AIME 2024-2025	Mathematics	5	86.67	2026-05-06
BRIDGE Medical Leaderboard	Medical	24	48.71	2026-05-27
BRIDGE Medical Leaderboard	Medical	116	39.21	2026-05-27
BRIDGE Medical Leaderboard	Medical	135	38	2026-05-27
LiveMedBench	Medical	35	0.0505	2026-05-27
MEDIC Benchmark	Medical	33	66.02 average normalized public table score	2026-05-27
Medical Chronology LLM Benchmark	Medical	11	0.88	2026-05-06
LanguageBench	Multilingual	30	0.13	2026-05-06
Design Arena	Multimodal	105	1060	2026-05-06
BBH	Reasoning	14	55	2026-05-06
GPQA Diamond	Reasoning	203	70%	2026-05-11
GPQA Diamond	Reasoning	270	61.3%	2026-05-11
Humanity's Last Exam (Text Only)	Reasoning	22	11.75	2026-05-06
MultiNRC	Reasoning	31	17.63	2026-05-06
SimpleBench	Reasoning	13	31	2026-05-06
LiveSecBench	Safety	9	69.23	2026-05-27
CritPt	Science	345	0%	2026-05-11
CritPt	Science	346	0%	2026-05-11
SciPredict	Science	10	16.63	2026-05-06
SWE-bench Pro	Software Engineering	7	21.41	2026-05-06
Structured Output Benchmark	Structured Output	9	85.70	2026-05-06
LiveSQLBench	Text to SQL	17	26.90	2026-05-06
Lech Mazur Writing	Writing	8	8.49	2026-05-06
Lech Mazur Writing	Writing	9	8.30	2026-05-06

Metadata

Benchmark Results