Llama 4 Maverick
Llama / Meta
58scores
57benchmarks
$0.15 / $0.6 per 1M tokenscost in/out
Metadata
Llama Open source
Aliases: llama-4-maverick, llama-4-maverick-17b-128e-instruct, meta-llama-llama-4-maverick, meta-llama-llama-4-maverick-17b-128e-instruct, meta-llama/llama-4-maverick, meta-llama/llama-4-maverick-17b-128e-instruct
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 138 | 4.38 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 133 | 0 | 2026-05-05 |
| AutoBench | Agentic | 32 | 2.27 | 2026-05-06 |
| PinchBench | Agentic | 62 | 0.46 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 335 | 17.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 239 | 6.8% | 2026-05-11 |
| OpenUGI | Alignment | 345 | 41.06 | 2026-05-06 |
| TextClass Benchmark | Classification | 77 | 1473.88 | 2026-05-06 |
| ALE-Bench | Coding | 88 | 172.97 | 2026-05-06 |
| BigCodeBench | Coding | 3 | 49.70 | 2026-05-06 |
| SciCode | Coding | 221 | 33.1% | 2026-05-11 |
| RP-Bench | Creative | 16 | 1453.40 | 2026-05-06 |
| RP-Bench | Creative | 30 | 3.92 | 2026-05-06 |
| OrgForge-IT | Cybersecurity | 9 | 0.800 | 2026-05-28 |
| VAREX-Bench | Document Understanding | 4 | 95.6% EM | 2026-05-28 |
| GSMA Open Telco Leaderboard | Domain | 34 | 59.60 | 2026-05-06 |
| TutorBench | Education | 25 | 40.20 | 2026-05-06 |
| FinanceArena | Finance | 5 | 44.6 | 2026-05-27 |
| MageBench Season 1 | Game | 19 | 1590 rating / 11 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 19 | 34.56 | 2026-05-06 |
| BenchLM | General Knowledge | 111 | 17 | 2026-05-06 |
| WeirdML | Generalization | 24 | 24.47 | 2026-05-06 |
| HealthBench Hard | Healthcare | 32 | 0.32 | 2026-05-27 |
| HUMAINE | Human Preference | 43 | 3.27 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 37 | 87 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 269 | 18.36 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 326 | 4.8% | 2026-05-11 |
| MathVista | Intelligence | 9 | 73.70 | 2026-05-06 |
| MMLU-Pro | Intelligence | 96 | 80.9% | 2026-05-11 |
| LEXam | Legal | 17 | 47.25% open / 49.10% MCQ | 2026-05-28 |
| Fiction.LiveBench | Long Context | 20 | 36.40 | 2026-05-06 |
| AIME 2025 | Math | 211 | 19.3% | 2026-05-11 |
| IneqMath | Math | 42 | 2.50 | 2026-05-06 |
| FrontierMath 2025-02-28 Private | Mathematics | 20 | 0.69 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 24 | 20.56 | 2026-05-06 |
| MEDIC Benchmark | Medical | 49 | 62.27 average normalized public table score | 2026-05-27 |
| LanguageBench | Multilingual | 10 | 0.63 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 14 | 35.19 | 2026-05-06 |
| ChartQA | Multimodal | 2 | 0.90 | 2026-05-06 |
| Design Arena | Multimodal | 117 | 938 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 43 | 38.33 | 2026-05-06 |
| VTB | Multimodal | 19 | 1.41 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 180 | 27.78 | 2026-05-11 |
| EnigmaEval | Reasoning | 39 | 0.58 | 2026-05-06 |
| GPQA Diamond | Reasoning | 230 | 67.1% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 44 | 5.34 | 2026-05-06 |
| MultiNRC | Reasoning | 40 | 8.44 | 2026-05-06 |
| SimpleBench | Reasoning | 16 | 27.70 | 2026-05-06 |
| LiveSecBench | Safety | 37 | 28.18 | 2026-05-27 |
| ChemBench | Science | 5 | 0.65 | 2026-05-06 |
| CritPt | Science | 286 | 0% | 2026-05-11 |
| MaCBench | Science | 1 | 0.70 | 2026-05-06 |
| Defects4J | Software Engineering | 16 | 0.337 | 2026-05-27 |
| IDE-Bench | Software Engineering | 13 | 2.5 | 2026-05-27 |
| RepairBench | Software Engineering | 16 | 0.308 | 2026-05-27 |
| SWE-bench Pro | Software Engineering | 13 | 5.24 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 28 | 18.05 | 2026-05-06 |
| Lech Mazur Writing | Writing | 26 | 6.37 | 2026-05-06 |
No matching rows.