Claude Sonnet 4
Claude / Anthropic
55scores
46benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-4-sonnet-20250522, anthropic-claude-sonnet-4, anthropic/claude-4-sonnet-20250522, anthropic/claude-sonnet-4, claude-4-sonnet-20250522, claude-sonnet-4
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 28 | 23 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 77 | 40 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 94 | 29 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 97 | 28 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 104 | 23.83 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 62 | 5.93 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 90 | 2.12 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 109 | 1.27 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 119 | 0.85 | 2026-05-05 |
| CAR-bench | Agentic | 5 | 0.47 | 2026-05-06 |
| Galileo Agent Leaderboard | Agentic | 4 | 0.55 | 2026-05-06 |
| MCPMark | Agentic | 15 | 0.28 | 2026-05-06 |
| PinchBench | Agentic | 38 | 0.80 | 2026-05-06 |
| AgentBench FC | Agents | 9 | 57.40 | 2026-05-06 |
| ArtifactsBench | Coding | 5 | 57.28 | 2026-05-06 |
| IOI | Coding | 34 | 6.5% | 2026-05-26 |
| LiveCodeBench | Coding | 20 | 55.90 | 2026-05-06 |
| LiveCodeBench | Coding | 21 | 47.10 | 2026-05-06 |
| LiveCodeBench | Coding | 78 | 59.673% | 2026-05-28 |
| GSMA Open Telco Leaderboard | Domain | 18 | 64.81 | 2026-05-06 |
| SAGE | Education | 38 | 35% | 2026-05-28 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 2 | 98.59 | 2026-05-06 |
| CorpFin v2 | Finance | 69 | 54.701% | 2026-05-28 |
| FinanceArena | Finance | 8 | 43.9 | 2026-05-27 |
| FinChain | Finance | 3 | 58.18 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 34 | 62.468% | 2026-05-28 |
| TaxEval v2 | Finance | 69 | 69.624% | 2026-05-28 |
| Xent Games | Game | 9 | 48.45 overall | 2026-05-28 |
| MedCode | Healthcare | 44 | 33.943% | 2026-05-28 |
| MedQA | Healthcare | 46 | 90.35% | 2026-04-16 |
| MedScribe | Healthcare | 45 | 72.411% | 2026-05-28 |
| HUMAINE | Human Preference | 26 | 3.50 | 2026-05-06 |
| GPQA Diamond | Intelligence | 69 | 69.444% | 2026-05-28 |
| MMLU Pro | Intelligence | 67 | 79.432% | 2026-05-28 |
| MMMU Pro | Intelligence | 44 | 72.386% | 2026-05-28 |
| AraGen v3 | Language | 8 | 75.58 | 2026-05-06 |
| HindiGen v1 | Language | 14 | 69.75 | 2026-05-06 |
| LegalBench | Legal | 34 | 82.954% | 2026-05-28 |
| PatentBench | Legal | 3 | 99.10 | 2026-05-26 |
| AIME | Math | 71 | 38.542% | 2026-04-16 |
| IneqMath | Math | 37 | 3 | 2026-05-06 |
| MATH 500 | Math | 26 | 90.323% | 2026-01-09 |
| MGSM | Math | 11 | 93.018% | 2026-01-09 |
| LanguageBench | Multilingual | 5 | 0.67 | 2026-05-06 |
| Design Arena | Multimodal | 63 | 1200 | 2026-05-06 |
| Math-VR | Multimodal | 17 | 28.1 | 2026-05-27 |
| Video SimpleQA | Multimodal | 10 | 35.60 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 22 | 45.49 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 34 | 43.21 | 2026-05-06 |
| VTB | Multimodal | 13 | 4.48 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 26 | 18.1 | 2026-05-27 |
| EnigmaEval | Reasoning | 23 | 3.12 | 2026-05-06 |
| EnigmaEval | Reasoning | 26 | 2.20 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 44 | 5.42 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 16 | 27.01 | 2026-05-06 |
No matching rows.