Qwen3 8B
Qwen / Qwen
54scores
41benchmarks
$0.05 / $0.4 per 1M tokenscost in/out
Metadata
Qwen Open source
Aliases: qwen-qwen3-8b, qwen-qwen3-8b-04-28, qwen/qwen3-8b, qwen/qwen3-8b-04-28, qwen3-8b, qwen3-8b-04-28
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 10 | 58 | 2026-05-06 |
| AMA-Bench | Agentic | 13 | 0.41 | 2026-05-06 |
| Berkeley Function-Calling Leaderboard | Agentic | 39 | 42.57% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 44 | 40.43% | 2026-05-27 |
| Tau2-Bench Telecom | Agentic | 260 | 27.8% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 284 | 24.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 319 | 2.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 320 | 2.3% | 2026-05-11 |
| OpenUGI | Alignment | 705 | 32.43 | 2026-05-06 |
| OpenUGI | Alignment | 719 | 32.18 | 2026-05-06 |
| Stick To Your Role! | Alignment | 13 | 0.72 | 2026-05-06 |
| BTZSC | Classification | 3 | 66.49 | 2026-05-06 |
| ABC-Bench | Coding | 11 | 8.3% +/- 1.1 | 2026-05-27 |
| SciCode | Coding | 351 | 22.6% | 2026-05-11 |
| SciCode | Coding | 392 | 16.8% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 21 | 48.16 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 22 | 48.82 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 23 | 45.10 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 23 | 46.23 | 2026-05-06 |
| RedSage-Bench | Cybersecurity | 9 | 81.85% | 2026-05-28 |
| MMTU | Data | 17 | 0.48 | 2026-05-06 |
| MMTU | Data | 24 | 0.35 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 68 | 41.07 | 2026-05-06 |
| EduGuardBench | Education | 13 | 0.61 | 2026-05-27 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 9 | 95.20 | 2026-05-06 |
| Fin-RATE | Finance | 15 | 5.48% | 2026-05-28 |
| FinToolBench | Finance | 1 | 0.4234 | 2026-05-27 |
| GeoCode Leaderboard | Geospatial | 21 | 48.35% pass@1 | 2026-05-28 |
| HealthBench Hard | Healthcare | 5 | 0.56 | 2026-05-27 |
| Artificial Analysis Intelligence Index | Intelligence | 363 | 13.18 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 411 | 10.63 | 2026-05-11 |
| FACTS Grounding | Intelligence | 10 | 0.40 | 2026-05-06 |
| Humanity's Last Exam | Intelligence | 401 | 4.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 477 | 2.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 188 | 74.3% | 2026-05-11 |
| MMLU-Pro | Intelligence | 255 | 64.3% | 2026-05-11 |
| AraGen v3 | Language | 45 | 26.55 | 2026-05-06 |
| La Leaderboard | Language | 52 | 13.53 | 2026-05-06 |
| Open Arabic LLM Leaderboard | Language | 128 | 42.41 | 2026-05-06 |
| Open Portuguese LLM Leaderboard | Language | 83 | 84.48 | 2026-05-06 |
| Ukrainian LLM Leaderboard | Language | 13 | 12.18 | 2026-05-06 |
| J1-ENVS | Legal | 14 | 42.48 | 2026-05-26 |
| AIME 2025 | Math | 199 | 24.3% | 2026-05-11 |
| AIME 2025 | Math | 212 | 19% | 2026-05-11 |
| MEDIC Benchmark | Medical | 74 | 55.41 average normalized public table score | 2026-05-27 |
| Medmarks | Medical | 12 | 0.3634064599826541 | 2026-05-27 |
| Medmarks | Medical | 38 | 0.5019479387680824 | 2026-05-27 |
| FLORES European Languages Leaderboard | Multilingual | 9 | 43.25 | 2026-05-06 |
| INCLUDE-base-44 European Languages | Multilingual | 8 | 0.61 | 2026-05-06 |
| GPQA Diamond | Reasoning | 288 | 58.9% | 2026-05-11 |
| GPQA Diamond | Reasoning | 370 | 45.2% | 2026-05-11 |
| CritPt | Science | 354 | 0% | 2026-05-11 |
| CritPt | Science | 355 | 0% | 2026-05-11 |
| K-MetBench | Weather | 21 | 70.1% accuracy | 2026-05-28 |
No matching rows.