Llama 3 8B Instruct | BenchmarkList

Metadata

Llama Open source

Aliases: llama-3-8b-instruct, meta-llama-llama-3-8b-instruct, meta-llama/llama-3-8b-instruct

Benchmark	Category	Rank	Score	Sampled
RewardBench	Alignment	145	64.50	2026-05-06
BigCodeBench	Coding	102	31.90	2026-05-06
ENAMEL	Coding	12	0.34	2026-05-06
McEval	Coding	13	36%	2026-05-27
NeoEvalPlusN	Creative	90	13.75	2026-05-06
Open FinLLM Leaderboard	Finance	15	18.930312%	2026-05-27
MixEval Chat	General Knowledge	27	45.60	2026-05-06
Open LLM Leaderboard v2	General Knowledge	1868	23.91	2026-05-06
Open LLM Leaderboard v2	General Knowledge	2570	20.61	2026-05-06
CyberSecEval	Generalization	7	45.27%	2026-05-27
L-Eval	Generalization	3	58.71%	2026-05-27
WildBench	Generalization	46	6.658846529814272	2026-05-27
MuSR	Intelligence	3367	5.40	2026-05-06
MuSR	Intelligence	4271	1.60	2026-05-06
ANLI	Language	3	57.30	2026-05-06
AraGen v3	Language	62	9.21	2026-05-06
Open Japanese LLM Leaderboard	Language	467	49.62	2026-05-06
Open Japanese LLM Leaderboard	Language	763	21.64	2026-05-06
Open Portuguese LLM Leaderboard	Language	233	82.06	2026-05-06
WinoGrande	Language	6	83.50	2026-05-06
BABILong	Long Context	22	30.67	2026-05-06
MATH Level 5	Math	2511	9.14	2026-05-06
MATH Level 5	Math	2579	8.69	2026-05-06
OTIS Mock AIME 2024-2025	Mathematics	37	4.31	2026-05-06
Open Medical-LLM Leaderboard	Medical	66	68.99	2026-05-06
BenchBench	Meta	75	0.44	2026-05-06
YALL Nous Leaderboard	Reasoning	93	51.24	2026-05-06
ZebraLogic	Reasoning	52	11.90	2026-05-06
AI-Secure LLM Trustworthy Leaderboard	Safety	3	0.81	2026-05-06
ChemBench	Science	37	0.46	2026-05-06
ChemBench	Science	38	0.46	2026-05-06
StructEval	Structured Output	11	51.59%	2026-05-28
VNTL Leaderboard	Translation	72	60.19	2026-05-06
VNTL Leaderboard	Translation	82	55.03	2026-05-06