Claude Sonnet 4

Claude / Anthropic

55scores
46benchmarks
$3 / $15 per 1M tokenscost in/out

Metadata

Claude Closed/API

Aliases: anthropic-claude-4-sonnet-20250522, anthropic-claude-sonnet-4, anthropic/claude-4-sonnet-20250522, anthropic/claude-sonnet-4, claude-4-sonnet-20250522, claude-sonnet-4

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 28 23 2026-05-06
ARC-AGI-1 Agentic 77 40 2026-05-05
ARC-AGI-1 Agentic 94 29 2026-05-05
ARC-AGI-1 Agentic 97 28 2026-05-05
ARC-AGI-1 Agentic 104 23.83 2026-05-05
ARC-AGI-2 Agentic 62 5.93 2026-05-05
ARC-AGI-2 Agentic 90 2.12 2026-05-05
ARC-AGI-2 Agentic 109 1.27 2026-05-05
ARC-AGI-2 Agentic 119 0.85 2026-05-05
CAR-bench Agentic 5 0.47 2026-05-06
Galileo Agent Leaderboard Agentic 4 0.55 2026-05-06
MCPMark Agentic 15 0.28 2026-05-06
PinchBench Agentic 38 0.80 2026-05-06
AgentBench FC Agents 9 57.40 2026-05-06
ArtifactsBench Coding 5 57.28 2026-05-06
IOI Coding 34 6.5% 2026-05-26
LiveCodeBench Coding 20 55.90 2026-05-06
LiveCodeBench Coding 21 47.10 2026-05-06
LiveCodeBench Coding 78 59.673% 2026-05-28
GSMA Open Telco Leaderboard Domain 18 64.81 2026-05-06
SAGE Education 38 35% 2026-05-28
kluster.ai LLM Hallucination Detection Leaderboard Factuality 2 98.59 2026-05-06
CorpFin v2 Finance 69 54.701% 2026-05-28
FinanceArena Finance 8 43.9 2026-05-27
FinChain Finance 3 58.18 ChainEval 2026-05-28
MortgageTax Finance 34 62.468% 2026-05-28
TaxEval v2 Finance 69 69.624% 2026-05-28
Xent Games Game 9 48.45 overall 2026-05-28
MedCode Healthcare 44 33.943% 2026-05-28
MedQA Healthcare 46 90.35% 2026-04-16
MedScribe Healthcare 45 72.411% 2026-05-28
HUMAINE Human Preference 26 3.50 2026-05-06
GPQA Diamond Intelligence 69 69.444% 2026-05-28
MMLU Pro Intelligence 67 79.432% 2026-05-28
MMMU Pro Intelligence 44 72.386% 2026-05-28
AraGen v3 Language 8 75.58 2026-05-06
HindiGen v1 Language 14 69.75 2026-05-06
LegalBench Legal 34 82.954% 2026-05-28
PatentBench Legal 3 99.10 2026-05-26
AIME Math 71 38.542% 2026-04-16
IneqMath Math 37 3 2026-05-06
MATH 500 Math 26 90.323% 2026-01-09
MGSM Math 11 93.018% 2026-01-09
LanguageBench Multilingual 5 0.67 2026-05-06
Design Arena Multimodal 63 1200 2026-05-06
Math-VR Multimodal 17 28.1 2026-05-27
Video SimpleQA Multimodal 10 35.60 2026-05-06
Visual-Language Understanding Multimodal 22 45.49 2026-05-06
Visual-Language Understanding Multimodal 34 43.21 2026-05-06
VTB Multimodal 13 4.48 2026-05-06
CAIS Text Capabilities Index Reasoning 26 18.1 2026-05-27
EnigmaEval Reasoning 23 3.12 2026-05-06
EnigmaEval Reasoning 26 2.20 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 44 5.42 2026-05-06
LiveSQLBench Text to SQL 16 27.01 2026-05-06