Claude Sonnet 4.6 | BenchmarkList

Metadata

Claude Closed/API

Aliases: anthropic-claude-4.6-sonnet-20260217, anthropic-claude-sonnet-4.6, anthropic/claude-4.6-sonnet-20260217, anthropic/claude-sonnet-4.6, claude-4.6-sonnet-20260217, claude-sonnet-4.6

Benchmark	Category	Rank	Score	Sampled
ALFWorld	Agentic	3	1.0	2026-05-27
APEX-Agents-AA	Agentic	6	28%	2026-05-11
ARC-AGI-1	Agentic	25	86.50	2026-05-05
ARC-AGI-1	Agentic	29	86	2026-05-05
ARC-AGI-2	Agentic	23	60.42	2026-05-05
ARC-AGI-2	Agentic	24	58.33	2026-05-05
AutoBench	Agentic	4	3.16	2026-05-06
Claw-Eval-Live	Agentic	3	61.9	2026-05-27
EnterpriseOps-Gym	Agentic	2	40.4%	2026-05-05
GDPval-AA	Agentic	1	1633	2026-05-06
Gert Labs Rankings	Agentic	9	0.61	2026-05-11
ITBench-AA	Agentic	6	39.8%	2026-05-28
MCP Atlas	Agentic	7	69.50	2026-05-06
OSWorld	Agentic	10	72.11%	2026-05-27
PinchBench	Agentic	13	0.88	2026-05-06
RealDataAgentBench	Agentic	3	0.86	2026-04-28
RuneBench	Agentic	12	3.20	2026-05-05
Tau2-Bench Telecom	Agentic	108	79.5%	2026-05-11
Tau2-Bench Telecom	Agentic	111	78.9%	2026-05-11
Tau2-Bench Telecom	Agentic	117	75.7%	2026-05-11
Terminal-Bench Hard	Agentic	7	53%	2026-05-11
Terminal-Bench Hard	Agentic	18	46.2%	2026-05-11
Terminal-Bench Hard	Agentic	30	42.4%	2026-05-11
Toolathlon	Agentic	4	41%	2026-05-28
Vending-Bench 2	Agentic	4	7204.14	2026-05-28
OpenUGI	Alignment	54	53.52	2026-05-06
OpenUGI	Alignment	61	52.82	2026-05-06
OpenUGI	Alignment	74	51.87	2026-05-06
OpenUGI	Alignment	252	44.01	2026-05-06
BioPipelineBench Verified	Biology	4	73.5%	2026-05-28
ProteinGym Hard	Biology	4	35.4%	2026-05-28
Protocol Troubleshooting (Anthropic Internal)	Biology	4	42.4%	2026-05-28
scBench	Biology	4	50.4%	2026-05-28
scBench	Biology	9	50.26%	2026-05-27
SpatialBench	Biology	4	48.7%	2026-05-28
SpatialBench	Biology	10	44.23%	2026-05-27
Structural Biology Open-Ended	Biology	4	31.3%	2026-05-28
Organic Chemistry (Anthropic Internal)	Chemistry	4	53.1%	2026-05-28
Arena AI Code	Coding	6	1526	2026-05-06
DeepSWE	Coding	4	31.56	2026-05-26
LiveCodeBench	Coding	35	82.091%	2026-05-28
LMArena WebDev Arena	Coding	6	1526.17	2026-05-06
SciCode	Coding	30	46.9%	2026-05-11
SciCode	Coding	33	46.8%	2026-05-11
SciCode	Coding	48	44.1%	2026-05-11
SWE-bench Verified	Coding	9	77.4%	2026-05-28
Terminal-Bench 2.0	Coding	7	59.551%	2026-05-28
Vibe Code Bench v1.1	Coding	9	51.476%	2026-05-28
CyberGym	Cybersecurity	4	65.2%	2026-05-28
ExploitBench v8-bench	Cybersecurity	7	3.37 points	2026-05-28
ExploitBench v8-bench	Cybersecurity	8	3.17 points	2026-05-28
ExploitBench v8-bench	Cybersecurity	10	3.37 points	2026-05-15
ExploitBench v8-bench	Cybersecurity	11	3.17 points	2026-05-15
Firefox 147 JS Exploitation	Cybersecurity	4	0%	2026-05-28
OrgForge-IT	Cybersecurity	4	0.800	2026-05-28
Arena AI Document	Document AI	5	1500	2026-05-06
GSMA Open Telco Leaderboard	Domain	58	44.78	2026-05-06
SAGE	Education	16	46.582%	2026-05-28
AA-Omniscience	Factuality	5	12.37	2026-05-11
Vectara HHEM Hallucination Leaderboard	Factuality	61	89.40	2026-05-06
CorpFin v2	Finance	16	65.307%	2026-05-28
Finance Agent v1.1	Finance	2	63.331%	2026-05-04
Finance Agent v2	Finance	5	51.035%	2026-05-28
MortgageTax	Finance	16	67.726%	2026-05-28
Rogo Big Finance Bench	Finance	3	59% rubric / 38% final	2026-05-28
TaxBench	Finance	12	11.20% mean pass^5	2026-05-27
TaxEval v2	Finance	2	77.106%	2026-05-28
React Native Evals	Frontend Development	8	80.6227% overall	2026-05-28
InfiniteBM Chess	Game	3	1190.33 Elo / 11 games	2026-05-28
InfiniteBM Coup	Game	2	1549.3 Elo / 34 games	2026-05-28
InfiniteBM Coup	Game	8	519.02 Elo / 6 games	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	3	1485.1 Elo / 20 games	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	13	1251.34 Elo / 209 games	2026-05-28
InfiniteBM Liar's Dice	Game	14	1267.56 Elo / 6613 games	2026-05-28
InfiniteBM Liar's Dice	Game	23	1170.63 Elo / 41 games	2026-05-28
InfiniteBM Settlers of Catan	Game	2	1805.89 Elo / 24 games	2026-05-28
InfiniteBM Werewolf	Game	6	1137.69 Elo / 22 games	2026-05-28
InfiniteBM Werewolf	Game	11	889.31 Elo / 19 games	2026-05-28
ALL Bench LLM	General Knowledge	20	32.28	2026-05-06
BenchLM	General Knowledge	15	83	2026-05-06
HealthBench Professional	Healthcare	3	41.7%	2026-05-28
MedQA	Healthcare	37	92.058%	2026-04-16
PhysicianBench	Healthcare	5	23.0 +/- 2.6	2026-05-27
Artificial Analysis Intelligence Index	Intelligence	15	51.72	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	46	44.38	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	57	42.6	2026-05-11
GPQA Diamond	Intelligence	23	85.606%	2026-05-28
Humanity's Last Exam	Intelligence	24	30%	2026-05-11
Humanity's Last Exam	Intelligence	109	13.2%	2026-05-11
Humanity's Last Exam	Intelligence	140	10.8%	2026-05-11
MMLU Pro	Intelligence	15	87.341%	2026-05-28
MMMU Pro	Intelligence	15	83.584%	2026-05-28
Vals Index	Intelligence	5	60.296%	2026-05-28
Vals Multimodal Index	Intelligence	5	60.783%	2026-05-28
CaseLaw v2	Legal	14	63.987%	2026-05-04
Harvey Legal Agent Benchmark	Legal	2	5.4%	2026-05-28
LegalBench	Legal	43	82.12%	2026-05-28
AIME	Math	20	92.292%	2026-04-16
ProofBench	Math	7	45%	2026-05-28
Global MMLU	Multilingual	5	86.1%	2026-05-28
ALL Bench Multimodal	Multimodal	16	32.53	2026-05-06
ALL Bench Multimodal	Multimodal	7	17.93	2026-05-06
Blueprint-Bench 2	Multimodal	8	0.570 +/- 0.011	2026-05-28
Design Arena	Multimodal	8	1331	2026-05-06
IDP Leaderboard	Multimodal	8	80.68	2026-05-06
LMArena Vision Arena	Multimodal	12	1277.89	2026-05-06
ARC-AGI v2	Reasoning	5	0.58	2026-05-06
CAIS Text Capabilities Index	Reasoning	11	32.6	2026-05-27
Context Arena	Reasoning	8	70.50	2026-05-06
Context Arena	Reasoning	9	70.38	2026-05-06
Context Arena	Reasoning	10	69.61	2026-05-06
Context Arena	Reasoning	28	46.73	2026-05-06
GPQA Diamond	Reasoning	29	87.5%	2026-05-11
GPQA Diamond	Reasoning	102	79.9%	2026-05-11
GPQA Diamond	Reasoning	103	79.7%	2026-05-11
CAIS Risk Index	Safety	5	38.8	2026-05-27
HarmActionsEval	Safety	3	2.84	2026-05-06
LiveSecBench	Safety	2	85.97	2026-05-27
BioMysteryBench Human-Difficult	Science	4	19.1%	2026-05-28
BioMysteryBench Human-Difficult	Science	4	19.1%	2026-04-29
BioMysteryBench Human-Solvable	Science	4	71.8%	2026-05-28
BioMysteryBench Human-Solvable	Science	4	71.8%	2026-04-29
CritPt	Science	44	3.1%	2026-05-11
CritPt	Science	91	0.9%	2026-05-11
CritPt	Science	92	0.9%	2026-05-11
ProgramBench	Software Engineering	3	0%	2026-05-05
SWE-PRBench	Software Engineering	2	0.152	2026-05-27
Structured Output Benchmark	Structured Output	11	85.40	2026-05-06
CAIS Vision Capabilities Index	Vision	21	47.7	2026-05-27

Metadata

Benchmark Results