Qwen2.5 72B Instruct
Qwen / Qwen
62scores
58benchmarks
$0.36 / $0.4 per 1M tokenscost in/out
Metadata
Qwen Open source
Aliases: qwen-2.5-72b-instruct, qwen-qwen-2.5-72b-instruct, qwen/qwen-2.5-72b-instruct
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Clembench Text v3.0 | Agentic | 19 | 48.07 | 2026-05-06 |
| Galileo Agent Leaderboard | Agentic | 7 | 0.51 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 218 | 34.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 279 | 4.5% | 2026-05-11 |
| AgentBench FC | Agents | 18 | 40.80 | 2026-05-06 |
| OpenUGI | Alignment | 916 | 27.43 | 2026-05-06 |
| BigCodeBench | Coding | 24 | 45.80 | 2026-05-06 |
| MultiPL-E | Coding | 7 | 0.751 | 2026-05-27 |
| SciCode | Coding | 302 | 26.7% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 19 | 50.41 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 17 | 52.29 | 2026-05-06 |
| TuRTLe Line Completion | Coding | 4 | 37.44 | 2026-05-06 |
| TuRTLe Module Completion (NotSoTiny) | Coding | 8 | 14.70 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 22 | 49.36 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 21 | 51.72 | 2026-05-06 |
| NeoEvalPlusN | Creative | 120 | 11.50 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 39 | 53.97 | 2026-05-06 |
| EduGuardBench | Education | 14 | 0.56 | 2026-05-27 |
| AI Energy Score | Efficiency | 110 | 5 | 2026-05-06 |
| BizFinBench | Finance | 9 | 67.7 | 2026-05-27 |
| FinEval | Finance | 19 | 69.4 | 2026-05-27 |
| INVESTORBENCH | Finance | 1 | 46.153% | 2026-05-27 |
| Open FinLLM Leaderboard | Finance | 5 | 41.361242% | 2026-05-27 |
| AlignBench | General Knowledge | 1 | 0.82 | 2026-05-06 |
| BenchLM | General Knowledge | 64 | 50 | 2026-05-06 |
| MMLU-Redux | General Knowledge | 29 | 0.87 | 2026-05-06 |
| Open LLM Leaderboard v2 | General Knowledge | 6 | 47.98 | 2026-05-06 |
| Arena-Hard | Generalization | 27 | 10.1% | 2026-05-27 |
| NeedleBench | Generalization | 1 | 81.02% | 2026-05-27 |
| HealthBench Hard | Healthcare | 11 | 0.49 | 2026-05-27 |
| HREF | Instruction Following | 1 | 46.21 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 310 | 15.56 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 398 | 4.2% | 2026-05-11 |
| MMLU-Pro | Intelligence | 206 | 72% | 2026-05-11 |
| MuSR | Intelligence | 1688 | 11.74 | 2026-05-06 |
| SuperGPQA | Intelligence | 11 | 34.33 | 2026-05-06 |
| AraGen v3 | Language | 26 | 48.92 | 2026-05-06 |
| HellaSwag | Language | 7 | 84.80 | 2026-05-06 |
| Open Arabic LLM Leaderboard | Language | 15 | 72.39 | 2026-05-06 |
| Open Portuguese LLM Leaderboard | Language | 36 | 86.30 | 2026-05-06 |
| PIQA | Language | 12 | 82.60 | 2026-05-06 |
| WinoGrande | Language | 7 | 82.30 | 2026-05-06 |
| AIME 2025 | Math | 223 | 14% | 2026-05-11 |
| IneqMath | Math | 40 | 2.50 | 2026-05-06 |
| MATH Level 5 | Math | 13 | 59.82 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 10 | 50.99 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 80 | 41.62 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 123 | 38.86 | 2026-05-27 |
| MEDIC Benchmark | Medical | 31 | 66.82 average normalized public table score | 2026-05-27 |
| BBH | Reasoning | 4 | 79.80 | 2026-05-06 |
| GPQA Diamond | Reasoning | 349 | 49.1% | 2026-05-11 |
| ZebraLogic | Reasoning | 25 | 26.60 | 2026-05-06 |
| Halluverse-M3 | Safety | 10 | 69.81% | 2026-05-28 |
| ThaiSafetyBench | Safety | 4 | 10.99% overall ASR | 2026-05-28 |
| X-Risks Leaderboard | Safety | 6 | 16.60 | 2026-05-06 |
| CritPt | Science | 338 | 0% | 2026-05-11 |
| Defects4J | Software Engineering | 25 | 0.255 | 2026-05-27 |
| RepairBench | Software Engineering | 24 | 0.242 | 2026-05-27 |
| JSONSchemaBench | Structured Output | 3 | 95.5% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 14 | 84% schema compliance | 2026-05-28 |
| JSONSchemaBench | Structured Output | 26 | 66.7% schema compliance | 2026-05-28 |
| VNTL Leaderboard | Translation | 12 | 70.79 | 2026-05-06 |
No matching rows.