Qwen3 235B A22B
Qwen / Qwen
84scores
67benchmarks
$0.455 / $1.82 per 1M tokenscost in/out
Metadata
Qwen Open source
Aliases: qwen-qwen3-235b-a22b, qwen-qwen3-235b-a22b-04-28, qwen/qwen3-235b-a22b, qwen/qwen3-235b-a22b-04-28, qwen3-235b-a22b, qwen3-235b-a22b-04-28
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 8 | 68 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 22 | 15.8% | 2026-05-05 |
| MultiChallenge | Agentic | 27 | 41.22 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 262 | 27.2% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 288 | 24% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 257 | 6.1% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 258 | 6.1% | 2026-05-11 |
| Stick To Your Role! | Alignment | 12 | 0.72 | 2026-05-06 |
| IOI | Coding | 54 | 0% | 2026-05-26 |
| LiveCodeBench | Coding | 13 | 65.90 | 2026-05-06 |
| LiveCodeBench | Coding | 60 | 70.62% | 2026-05-28 |
| MultiPL-E | Coding | 11 | 0.6594 | 2026-05-27 |
| SciCode | Coding | 100 | 39.9% | 2026-05-11 |
| SciCode | Coding | 250 | 29.9% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 11 | 67.54 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 11 | 66.80 | 2026-05-06 |
| TuRTLe Line Completion | Coding | 1 | 41.94 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 8 | 69.16 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 8 | 69.17 | 2026-05-06 |
| NeoEvalPlusN | Creative | 135 | 10 | 2026-05-06 |
| NeoEvalPlusN | Creative | 145 | 9.25 | 2026-05-06 |
| EduGuardBench | Education | 12 | 0.67 | 2026-05-27 |
| AI Energy Score | Efficiency | 101 | 5 | 2026-05-06 |
| AI Energy Score | Efficiency | 141 | 4 | 2026-05-06 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 11 | 95.88 | 2026-05-06 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 12 | 95.83 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 45 | 90.70 | 2026-05-06 |
| Fin-RATE | Finance | 4 | 24.39% | 2026-05-28 |
| TaxEval v2 | Finance | 62 | 70.646% | 2026-05-28 |
| MageBench Season 1 | Game | 18 | 1594 rating / 11 games | 2026-05-28 |
| BenchLM | General Knowledge | 68 | 47 | 2026-05-06 |
| BenchLM | General Knowledge | 87 | 33 | 2026-05-06 |
| MMLU-Redux | General Knowledge | 28 | 0.87 | 2026-05-06 |
| Arena-Hard | Generalization | 9 | 58.4% | 2026-05-27 |
| WeirdML | Generalization | 14 | 41.04 | 2026-05-06 |
| HealthBench Hard | Healthcare | 8 | 0.5 | 2026-05-27 |
| MedQA | Healthcare | 45 | 90.617% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 244 | 19.79 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 286 | 16.96 | 2026-05-11 |
| GPQA Diamond | Intelligence | 66 | 70.202% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 129 | 11.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 340 | 4.7% | 2026-05-11 |
| MMLU Pro | Intelligence | 54 | 81.246% | 2026-05-28 |
| MMLU-Pro | Intelligence | 65 | 82.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 164 | 76.2% | 2026-05-11 |
| LAMBADA | Language | 5 | 71.10 | 2026-05-06 |
| PIQA | Language | 15 | 79.90 | 2026-05-06 |
| LegalBench | Legal | 58 | 80.179% | 2026-05-28 |
| LEXam | Legal | 18 | 47.25% open / 48.19% MCQ | 2026-05-28 |
| ConStory-Bench | Long Context | 23 | CED 1.447 | 2026-05-28 |
| Fiction.LiveBench | Long Context | 5 | 68.80 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 15 | 44.40 | 2026-05-06 |
| AIME | Math | 40 | 83.958% | 2026-04-16 |
| AIME 2025 | Math | 61 | 82% | 2026-05-11 |
| AIME 2025 | Math | 202 | 23.7% | 2026-05-11 |
| IneqMath | Math | 26 | 6 | 2026-05-06 |
| MATH 500 | Math | 8 | 94.6% | 2026-01-09 |
| MGSM | Math | 17 | 92.473% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 12 | 8.48 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 13 | 0 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 5 | 86.67 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 24 | 48.71 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 116 | 39.21 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 135 | 38 | 2026-05-27 |
| LiveMedBench | Medical | 35 | 0.0505 | 2026-05-27 |
| MEDIC Benchmark | Medical | 33 | 66.02 average normalized public table score | 2026-05-27 |
| Medical Chronology LLM Benchmark | Medical | 11 | 0.88 | 2026-05-06 |
| LanguageBench | Multilingual | 30 | 0.13 | 2026-05-06 |
| Design Arena | Multimodal | 105 | 1060 | 2026-05-06 |
| BBH | Reasoning | 14 | 55 | 2026-05-06 |
| GPQA Diamond | Reasoning | 203 | 70% | 2026-05-11 |
| GPQA Diamond | Reasoning | 270 | 61.3% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 22 | 11.75 | 2026-05-06 |
| MultiNRC | Reasoning | 31 | 17.63 | 2026-05-06 |
| SimpleBench | Reasoning | 13 | 31 | 2026-05-06 |
| LiveSecBench | Safety | 9 | 69.23 | 2026-05-27 |
| CritPt | Science | 345 | 0% | 2026-05-11 |
| CritPt | Science | 346 | 0% | 2026-05-11 |
| SciPredict | Science | 10 | 16.63 | 2026-05-06 |
| SWE-bench Pro | Software Engineering | 7 | 21.41 | 2026-05-06 |
| Structured Output Benchmark | Structured Output | 9 | 85.70 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 17 | 26.90 | 2026-05-06 |
| Lech Mazur Writing | Writing | 8 | 8.49 | 2026-05-06 |
| Lech Mazur Writing | Writing | 9 | 8.30 | 2026-05-06 |
No matching rows.