Qwen2.5 72B Instruct

Qwen / Qwen

62scores
58benchmarks
$0.36 / $0.4 per 1M tokenscost in/out

Metadata

Qwen Open source

Aliases: qwen-2.5-72b-instruct, qwen-qwen-2.5-72b-instruct, qwen/qwen-2.5-72b-instruct

Benchmark Results

Benchmark Category Rank Score Sampled
Clembench Text v3.0 Agentic 19 48.07 2026-05-06
Galileo Agent Leaderboard Agentic 7 0.51 2026-05-06
Tau2-Bench Telecom Agentic 218 34.5% 2026-05-11
Terminal-Bench Hard Agentic 279 4.5% 2026-05-11
AgentBench FC Agents 18 40.80 2026-05-06
OpenUGI Alignment 916 27.43 2026-05-06
BigCodeBench Coding 24 45.80 2026-05-06
MultiPL-E Coding 7 0.751 2026-05-27
SciCode Coding 302 26.7% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 19 50.41 2026-05-06
TuRTLe Code Completion (Verilator) Coding 17 52.29 2026-05-06
TuRTLe Line Completion Coding 4 37.44 2026-05-06
TuRTLe Module Completion (NotSoTiny) Coding 8 14.70 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 22 49.36 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 21 51.72 2026-05-06
NeoEvalPlusN Creative 120 11.50 2026-05-06
GSMA Open Telco Leaderboard Domain 39 53.97 2026-05-06
EduGuardBench Education 14 0.56 2026-05-27
AI Energy Score Efficiency 110 5 2026-05-06
BizFinBench Finance 9 67.7 2026-05-27
FinEval Finance 19 69.4 2026-05-27
INVESTORBENCH Finance 1 46.153% 2026-05-27
Open FinLLM Leaderboard Finance 5 41.361242% 2026-05-27
AlignBench General Knowledge 1 0.82 2026-05-06
BenchLM General Knowledge 64 50 2026-05-06
MMLU-Redux General Knowledge 29 0.87 2026-05-06
Open LLM Leaderboard v2 General Knowledge 6 47.98 2026-05-06
Arena-Hard Generalization 27 10.1% 2026-05-27
NeedleBench Generalization 1 81.02% 2026-05-27
HealthBench Hard Healthcare 11 0.49 2026-05-27
HREF Instruction Following 1 46.21 2026-05-06
Artificial Analysis Intelligence Index Intelligence 310 15.56 2026-05-11
Humanity's Last Exam Intelligence 398 4.2% 2026-05-11
MMLU-Pro Intelligence 206 72% 2026-05-11
MuSR Intelligence 1688 11.74 2026-05-06
SuperGPQA Intelligence 11 34.33 2026-05-06
AraGen v3 Language 26 48.92 2026-05-06
HellaSwag Language 7 84.80 2026-05-06
Open Arabic LLM Leaderboard Language 15 72.39 2026-05-06
Open Portuguese LLM Leaderboard Language 36 86.30 2026-05-06
PIQA Language 12 82.60 2026-05-06
WinoGrande Language 7 82.30 2026-05-06
AIME 2025 Math 223 14% 2026-05-11
IneqMath Math 40 2.50 2026-05-06
MATH Level 5 Math 13 59.82 2026-05-06
BRIDGE Medical Leaderboard Medical 10 50.99 2026-05-27
BRIDGE Medical Leaderboard Medical 80 41.62 2026-05-27
BRIDGE Medical Leaderboard Medical 123 38.86 2026-05-27
MEDIC Benchmark Medical 31 66.82 average normalized public table score 2026-05-27
BBH Reasoning 4 79.80 2026-05-06
GPQA Diamond Reasoning 349 49.1% 2026-05-11
ZebraLogic Reasoning 25 26.60 2026-05-06
Halluverse-M3 Safety 10 69.81% 2026-05-28
ThaiSafetyBench Safety 4 10.99% overall ASR 2026-05-28
X-Risks Leaderboard Safety 6 16.60 2026-05-06
CritPt Science 338 0% 2026-05-11
Defects4J Software Engineering 25 0.255 2026-05-27
RepairBench Software Engineering 24 0.242 2026-05-27
JSONSchemaBench Structured Output 3 95.5% schema compliance 2026-05-28
JSONSchemaBench Structured Output 14 84% schema compliance 2026-05-28
JSONSchemaBench Structured Output 26 66.7% schema compliance 2026-05-28
VNTL Leaderboard Translation 12 70.79 2026-05-06