Qwen3 235B A22B

Qwen / Qwen

84scores
67benchmarks
$0.455 / $1.82 per 1M tokenscost in/out

Metadata

Qwen Open source

Aliases: qwen-qwen3-235b-a22b, qwen-qwen3-235b-a22b-04-28, qwen/qwen3-235b-a22b, qwen/qwen3-235b-a22b-04-28, qwen3-235b-a22b, qwen3-235b-a22b-04-28

Benchmark Results

Benchmark Category Rank Score Sampled
ADBench Agentic 8 68 2026-05-06
EnterpriseOps-Gym Agentic 22 15.8% 2026-05-05
MultiChallenge Agentic 27 41.22 2026-05-06
Tau2-Bench Telecom Agentic 262 27.2% 2026-05-11
Tau2-Bench Telecom Agentic 288 24% 2026-05-11
Terminal-Bench Hard Agentic 257 6.1% 2026-05-11
Terminal-Bench Hard Agentic 258 6.1% 2026-05-11
Stick To Your Role! Alignment 12 0.72 2026-05-06
IOI Coding 54 0% 2026-05-26
LiveCodeBench Coding 13 65.90 2026-05-06
LiveCodeBench Coding 60 70.62% 2026-05-28
MultiPL-E Coding 11 0.6594 2026-05-27
SciCode Coding 100 39.9% 2026-05-11
SciCode Coding 250 29.9% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 11 67.54 2026-05-06
TuRTLe Code Completion (Verilator) Coding 11 66.80 2026-05-06
TuRTLe Line Completion Coding 1 41.94 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 8 69.16 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 8 69.17 2026-05-06
NeoEvalPlusN Creative 135 10 2026-05-06
NeoEvalPlusN Creative 145 9.25 2026-05-06
EduGuardBench Education 12 0.67 2026-05-27
AI Energy Score Efficiency 101 5 2026-05-06
AI Energy Score Efficiency 141 4 2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard Factuality 11 95.88 2026-05-06
kluster.ai LLM Hallucination Detection Leaderboard Factuality 12 95.83 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 45 90.70 2026-05-06
Fin-RATE Finance 4 24.39% 2026-05-28
TaxEval v2 Finance 62 70.646% 2026-05-28
MageBench Season 1 Game 18 1594 rating / 11 games 2026-05-28
BenchLM General Knowledge 68 47 2026-05-06
BenchLM General Knowledge 87 33 2026-05-06
MMLU-Redux General Knowledge 28 0.87 2026-05-06
Arena-Hard Generalization 9 58.4% 2026-05-27
WeirdML Generalization 14 41.04 2026-05-06
HealthBench Hard Healthcare 8 0.5 2026-05-27
MedQA Healthcare 45 90.617% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 244 19.79 2026-05-11
Artificial Analysis Intelligence Index Intelligence 286 16.96 2026-05-11
GPQA Diamond Intelligence 66 70.202% 2026-05-28
Humanity's Last Exam Intelligence 129 11.7% 2026-05-11
Humanity's Last Exam Intelligence 340 4.7% 2026-05-11
MMLU Pro Intelligence 54 81.246% 2026-05-28
MMLU-Pro Intelligence 65 82.8% 2026-05-11
MMLU-Pro Intelligence 164 76.2% 2026-05-11
LAMBADA Language 5 71.10 2026-05-06
PIQA Language 15 79.90 2026-05-06
LegalBench Legal 58 80.179% 2026-05-28
LEXam Legal 18 47.25% open / 48.19% MCQ 2026-05-28
ConStory-Bench Long Context 23 CED 1.447 2026-05-28
Fiction.LiveBench Long Context 5 68.80 2026-05-06
Fiction.LiveBench Long Context 15 44.40 2026-05-06
AIME Math 40 83.958% 2026-04-16
AIME 2025 Math 61 82% 2026-05-11
AIME 2025 Math 202 23.7% 2026-05-11
IneqMath Math 26 6 2026-05-06
MATH 500 Math 8 94.6% 2026-01-09
MGSM Math 17 92.473% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 12 8.48 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 13 0 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 5 86.67 2026-05-06
BRIDGE Medical Leaderboard Medical 24 48.71 2026-05-27
BRIDGE Medical Leaderboard Medical 116 39.21 2026-05-27
BRIDGE Medical Leaderboard Medical 135 38 2026-05-27
LiveMedBench Medical 35 0.0505 2026-05-27
MEDIC Benchmark Medical 33 66.02 average normalized public table score 2026-05-27
Medical Chronology LLM Benchmark Medical 11 0.88 2026-05-06
LanguageBench Multilingual 30 0.13 2026-05-06
Design Arena Multimodal 105 1060 2026-05-06
BBH Reasoning 14 55 2026-05-06
GPQA Diamond Reasoning 203 70% 2026-05-11
GPQA Diamond Reasoning 270 61.3% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 22 11.75 2026-05-06
MultiNRC Reasoning 31 17.63 2026-05-06
SimpleBench Reasoning 13 31 2026-05-06
LiveSecBench Safety 9 69.23 2026-05-27
CritPt Science 345 0% 2026-05-11
CritPt Science 346 0% 2026-05-11
SciPredict Science 10 16.63 2026-05-06
SWE-bench Pro Software Engineering 7 21.41 2026-05-06
Structured Output Benchmark Structured Output 9 85.70 2026-05-06
LiveSQLBench Text to SQL 17 26.90 2026-05-06
Lech Mazur Writing Writing 8 8.49 2026-05-06
Lech Mazur Writing Writing 9 8.30 2026-05-06