Llama 4 Scout | BenchmarkList

Metadata

Llama Open source

Aliases: llama-4-scout, llama-4-scout-17b-16e-instruct, meta-llama-llama-4-scout, meta-llama-llama-4-scout-17b-16e-instruct, meta-llama/llama-4-scout, meta-llama/llama-4-scout-17b-16e-instruct

Benchmark	Category	Rank	Score	Sampled
ARC-AGI-1	Agentic	142	0.50	2026-05-05
ARC-AGI-2	Agentic	134	0	2026-05-05
Berkeley Function-Calling Leaderboard	Agentic	72	28.13%	2026-05-27
PinchBench	Agentic	68	0.08	2026-05-06
Tau2-Bench Telecom	Agentic	345	15.5%	2026-05-11
Terminal-Bench Hard	Agentic	330	1.5%	2026-05-11
UAVBench	Agentic	16	75.10	2026-05-06
OpenUGI	Alignment	779	31.02	2026-05-06
Stick To Your Role!	Alignment	18	0.62	2026-05-06
TextClass Benchmark	Classification	70	1500.45	2026-05-06
LiveCodeBench	Coding	102	38.541%	2026-05-28
SciCode	Coding	391	17%	2026-05-11
NeoEvalPlusN	Creative	131	10.25	2026-05-06
MMTU	Data	22	0.39	2026-05-06
VAREX-Bench	Document Understanding	7	94.3% EM	2026-05-28
SAGE	Education	39	34.834%	2026-05-28
kluster.ai LLM Hallucination Detection Leaderboard	Factuality	10	96.64	2026-05-06
Vectara HHEM Hallucination Leaderboard	Factuality	37	92.30	2026-05-06
BizFinBench	Finance	15	61.17	2026-05-27
CorpFin v2	Finance	88	46.776%	2026-05-28
MortgageTax	Finance	50	57.75%	2026-05-28
TaxEval v2	Finance	108	55.192%	2026-05-28
ALL Bench LLM	General Knowledge	26	26.02	2026-05-06
BenchLM	General Knowledge	106	22	2026-05-06
HealthBench Hard	Healthcare	33	0.32	2026-05-27
MedCode	Healthcare	59	23.311%	2026-05-28
MedQA	Healthcare	92	50.9%	2026-04-16
MedScribe	Healthcare	60	50.593%	2026-05-28
Artificial Analysis Intelligence Index	Intelligence	357	13.52	2026-05-11
GPQA Diamond	Intelligence	99	46.97%	2026-05-28
Humanity's Last Exam	Intelligence	378	4.3%	2026-05-11
MMLU Pro	Intelligence	94	69.632%	2026-05-28
MMLU-Pro	Intelligence	175	75.2%	2026-05-11
MMMU Pro	Intelligence	65	58.752%	2026-05-28
LegalBench	Legal	82	72.036%	2026-05-28
Fiction.LiveBench	Long Context	22	27.30	2026-05-06
AIME	Math	80	18.958%	2026-04-16
AIME 2025	Math	221	14%	2026-05-11
IneqMath	Math	48	1.50	2026-05-06
MATH 500	Math	40	79.2%	2026-01-09
MGSM	Math	56	87.964%	2026-01-09
FrontierMath 2025-02-28 Private	Mathematics	24	0	2026-05-06
OTIS Mock AIME 2024-2025	Mathematics	30	7.78	2026-05-06
BRIDGE Medical Leaderboard	Medical	91	40.64	2026-05-27
BRIDGE Medical Leaderboard	Medical	174	35.12	2026-05-27
BRIDGE Medical Leaderboard	Medical	234	29.38	2026-05-27
MEDIC Benchmark	Medical	40	63.89 average normalized public table score	2026-05-27
ALL Bench Multimodal	Multimodal	25	27.51	2026-05-06
ChartQA	Multimodal	5	0.89	2026-05-06
Design Arena	Multimodal	122	848	2026-05-06
VTB	Multimodal	19	1.58	2026-05-06
Artificial Analysis Openness Index	Openness	181	27.78	2026-05-11
GPQA Diamond	Reasoning	290	58.7%	2026-05-11
CritPt	Science	287	0%	2026-05-11
MaCBench	Science	3	0.63	2026-05-06
IDE-Bench	Software Engineering	13	2.5	2026-05-27
LiveSQLBench	Text to SQL	27	18.55	2026-05-06

Metadata

Benchmark Results