Llama 4 Maverick

Llama / Meta

58scores
57benchmarks
$0.15 / $0.6 per 1M tokenscost in/out

Metadata

Llama Open source

Aliases: llama-4-maverick, llama-4-maverick-17b-128e-instruct, meta-llama-llama-4-maverick, meta-llama-llama-4-maverick-17b-128e-instruct, meta-llama/llama-4-maverick, meta-llama/llama-4-maverick-17b-128e-instruct

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 138 4.38 2026-05-05
ARC-AGI-2 Agentic 133 0 2026-05-05
AutoBench Agentic 32 2.27 2026-05-06
PinchBench Agentic 62 0.46 2026-05-06
Tau2-Bench Telecom Agentic 335 17.8% 2026-05-11
Terminal-Bench Hard Agentic 239 6.8% 2026-05-11
OpenUGI Alignment 345 41.06 2026-05-06
TextClass Benchmark Classification 77 1473.88 2026-05-06
ALE-Bench Coding 88 172.97 2026-05-06
BigCodeBench Coding 3 49.70 2026-05-06
SciCode Coding 221 33.1% 2026-05-11
RP-Bench Creative 16 1453.40 2026-05-06
RP-Bench Creative 30 3.92 2026-05-06
OrgForge-IT Cybersecurity 9 0.800 2026-05-28
VAREX-Bench Document Understanding 4 95.6% EM 2026-05-28
GSMA Open Telco Leaderboard Domain 34 59.60 2026-05-06
TutorBench Education 25 40.20 2026-05-06
FinanceArena Finance 5 44.6 2026-05-27
MageBench Season 1 Game 19 1590 rating / 11 games 2026-05-28
ALL Bench LLM General Knowledge 19 34.56 2026-05-06
BenchLM General Knowledge 111 17 2026-05-06
WeirdML Generalization 24 24.47 2026-05-06
HealthBench Hard Healthcare 32 0.32 2026-05-27
HUMAINE Human Preference 43 3.27 2026-05-06
AIIQ Composite IQ Intelligence 37 87 2026-05-12
Artificial Analysis Intelligence Index Intelligence 269 18.36 2026-05-11
Humanity's Last Exam Intelligence 326 4.8% 2026-05-11
MathVista Intelligence 9 73.70 2026-05-06
MMLU-Pro Intelligence 96 80.9% 2026-05-11
LEXam Legal 17 47.25% open / 49.10% MCQ 2026-05-28
Fiction.LiveBench Long Context 20 36.40 2026-05-06
AIME 2025 Math 211 19.3% 2026-05-11
IneqMath Math 42 2.50 2026-05-06
FrontierMath 2025-02-28 Private Mathematics 20 0.69 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 24 20.56 2026-05-06
MEDIC Benchmark Medical 49 62.27 average normalized public table score 2026-05-27
LanguageBench Multilingual 10 0.63 2026-05-06
ALL Bench Multimodal Multimodal 14 35.19 2026-05-06
ChartQA Multimodal 2 0.90 2026-05-06
Design Arena Multimodal 117 938 2026-05-06
Visual-Language Understanding Multimodal 43 38.33 2026-05-06
VTB Multimodal 19 1.41 2026-05-06
Artificial Analysis Openness Index Openness 180 27.78 2026-05-11
EnigmaEval Reasoning 39 0.58 2026-05-06
GPQA Diamond Reasoning 230 67.1% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 44 5.34 2026-05-06
MultiNRC Reasoning 40 8.44 2026-05-06
SimpleBench Reasoning 16 27.70 2026-05-06
LiveSecBench Safety 37 28.18 2026-05-27
ChemBench Science 5 0.65 2026-05-06
CritPt Science 286 0% 2026-05-11
MaCBench Science 1 0.70 2026-05-06
Defects4J Software Engineering 16 0.337 2026-05-27
IDE-Bench Software Engineering 13 2.5 2026-05-27
RepairBench Software Engineering 16 0.308 2026-05-27
SWE-bench Pro Software Engineering 13 5.24 2026-05-06
LiveSQLBench Text to SQL 28 18.05 2026-05-06
Lech Mazur Writing Writing 26 6.37 2026-05-06