Llama 4 Scout
Llama / Meta
57scores
55benchmarks
$0.08 / $0.3 per 1M tokenscost in/out
Metadata
Llama Open source
Aliases: llama-4-scout, llama-4-scout-17b-16e-instruct, meta-llama-llama-4-scout, meta-llama-llama-4-scout-17b-16e-instruct, meta-llama/llama-4-scout, meta-llama/llama-4-scout-17b-16e-instruct
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 142 | 0.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 134 | 0 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 72 | 28.13% | 2026-05-27 |
| PinchBench | Agentic | 68 | 0.08 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 345 | 15.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 330 | 1.5% | 2026-05-11 |
| UAVBench | Agentic | 16 | 75.10 | 2026-05-06 |
| OpenUGI | Alignment | 779 | 31.02 | 2026-05-06 |
| Stick To Your Role! | Alignment | 18 | 0.62 | 2026-05-06 |
| TextClass Benchmark | Classification | 70 | 1500.45 | 2026-05-06 |
| LiveCodeBench | Coding | 102 | 38.541% | 2026-05-28 |
| SciCode | Coding | 391 | 17% | 2026-05-11 |
| NeoEvalPlusN | Creative | 131 | 10.25 | 2026-05-06 |
| MMTU | Data | 22 | 0.39 | 2026-05-06 |
| VAREX-Bench | Document Understanding | 7 | 94.3% EM | 2026-05-28 |
| SAGE | Education | 39 | 34.834% | 2026-05-28 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 10 | 96.64 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 37 | 92.30 | 2026-05-06 |
| BizFinBench | Finance | 15 | 61.17 | 2026-05-27 |
| CorpFin v2 | Finance | 88 | 46.776% | 2026-05-28 |
| MortgageTax | Finance | 50 | 57.75% | 2026-05-28 |
| TaxEval v2 | Finance | 108 | 55.192% | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 26 | 26.02 | 2026-05-06 |
| BenchLM | General Knowledge | 106 | 22 | 2026-05-06 |
| HealthBench Hard | Healthcare | 33 | 0.32 | 2026-05-27 |
| MedCode | Healthcare | 59 | 23.311% | 2026-05-28 |
| MedQA | Healthcare | 92 | 50.9% | 2026-04-16 |
| MedScribe | Healthcare | 60 | 50.593% | 2026-05-28 |
| Artificial Analysis Intelligence Index | Intelligence | 357 | 13.52 | 2026-05-11 |
| GPQA Diamond | Intelligence | 99 | 46.97% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 378 | 4.3% | 2026-05-11 |
| MMLU Pro | Intelligence | 94 | 69.632% | 2026-05-28 |
| MMLU-Pro | Intelligence | 175 | 75.2% | 2026-05-11 |
| MMMU Pro | Intelligence | 65 | 58.752% | 2026-05-28 |
| LegalBench | Legal | 82 | 72.036% | 2026-05-28 |
| Fiction.LiveBench | Long Context | 22 | 27.30 | 2026-05-06 |
| AIME | Math | 80 | 18.958% | 2026-04-16 |
| AIME 2025 | Math | 221 | 14% | 2026-05-11 |
| IneqMath | Math | 48 | 1.50 | 2026-05-06 |
| MATH 500 | Math | 40 | 79.2% | 2026-01-09 |
| MGSM | Math | 56 | 87.964% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 24 | 0 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 30 | 7.78 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 91 | 40.64 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 174 | 35.12 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 234 | 29.38 | 2026-05-27 |
| MEDIC Benchmark | Medical | 40 | 63.89 average normalized public table score | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 25 | 27.51 | 2026-05-06 |
| ChartQA | Multimodal | 5 | 0.89 | 2026-05-06 |
| Design Arena | Multimodal | 122 | 848 | 2026-05-06 |
| VTB | Multimodal | 19 | 1.58 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 181 | 27.78 | 2026-05-11 |
| GPQA Diamond | Reasoning | 290 | 58.7% | 2026-05-11 |
| CritPt | Science | 287 | 0% | 2026-05-11 |
| MaCBench | Science | 3 | 0.63 | 2026-05-06 |
| IDE-Bench | Software Engineering | 13 | 2.5 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 27 | 18.55 | 2026-05-06 |
No matching rows.