Claude Sonnet 4.6
Claude / Anthropic
129scores
95benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-4.6-sonnet-20260217, anthropic-claude-sonnet-4.6, anthropic/claude-4.6-sonnet-20260217, anthropic/claude-sonnet-4.6, claude-4.6-sonnet-20260217, claude-sonnet-4.6
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ALFWorld | Agentic | 3 | 1.0 | 2026-05-27 |
| APEX-Agents-AA | Agentic | 6 | 28% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 25 | 86.50 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 29 | 86 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 23 | 60.42 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 24 | 58.33 | 2026-05-05 |
| AutoBench | Agentic | 4 | 3.16 | 2026-05-06 |
| Claw-Eval-Live | Agentic | 3 | 61.9 | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 2 | 40.4% | 2026-05-05 |
| GDPval-AA | Agentic | 1 | 1633 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 9 | 0.61 | 2026-05-11 |
| ITBench-AA | Agentic | 6 | 39.8% | 2026-05-28 |
| MCP Atlas | Agentic | 7 | 69.50 | 2026-05-06 |
| OSWorld | Agentic | 10 | 72.11% | 2026-05-27 |
| PinchBench | Agentic | 13 | 0.88 | 2026-05-06 |
| RealDataAgentBench | Agentic | 3 | 0.86 | 2026-04-28 |
| RuneBench | Agentic | 12 | 3.20 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 108 | 79.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 111 | 78.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 117 | 75.7% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 7 | 53% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 18 | 46.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 30 | 42.4% | 2026-05-11 |
| Toolathlon | Agentic | 4 | 41% | 2026-05-28 |
| Vending-Bench 2 | Agentic | 4 | 7204.14 | 2026-05-28 |
| OpenUGI | Alignment | 54 | 53.52 | 2026-05-06 |
| OpenUGI | Alignment | 61 | 52.82 | 2026-05-06 |
| OpenUGI | Alignment | 74 | 51.87 | 2026-05-06 |
| OpenUGI | Alignment | 252 | 44.01 | 2026-05-06 |
| BioPipelineBench Verified | Biology | 4 | 73.5% | 2026-05-28 |
| ProteinGym Hard | Biology | 4 | 35.4% | 2026-05-28 |
| Protocol Troubleshooting (Anthropic Internal) | Biology | 4 | 42.4% | 2026-05-28 |
| scBench | Biology | 4 | 50.4% | 2026-05-28 |
| scBench | Biology | 9 | 50.26% | 2026-05-27 |
| SpatialBench | Biology | 4 | 48.7% | 2026-05-28 |
| SpatialBench | Biology | 10 | 44.23% | 2026-05-27 |
| Structural Biology Open-Ended | Biology | 4 | 31.3% | 2026-05-28 |
| Organic Chemistry (Anthropic Internal) | Chemistry | 4 | 53.1% | 2026-05-28 |
| Arena AI Code | Coding | 6 | 1526 | 2026-05-06 |
| DeepSWE | Coding | 4 | 31.56 | 2026-05-26 |
| LiveCodeBench | Coding | 35 | 82.091% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 6 | 1526.17 | 2026-05-06 |
| SciCode | Coding | 30 | 46.9% | 2026-05-11 |
| SciCode | Coding | 33 | 46.8% | 2026-05-11 |
| SciCode | Coding | 48 | 44.1% | 2026-05-11 |
| SWE-bench Verified | Coding | 9 | 77.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 7 | 59.551% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 9 | 51.476% | 2026-05-28 |
| CyberGym | Cybersecurity | 4 | 65.2% | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 7 | 3.37 points | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 8 | 3.17 points | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 10 | 3.37 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 11 | 3.17 points | 2026-05-15 |
| Firefox 147 JS Exploitation | Cybersecurity | 4 | 0% | 2026-05-28 |
| OrgForge-IT | Cybersecurity | 4 | 0.800 | 2026-05-28 |
| Arena AI Document | Document AI | 5 | 1500 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 58 | 44.78 | 2026-05-06 |
| SAGE | Education | 16 | 46.582% | 2026-05-28 |
| AA-Omniscience | Factuality | 5 | 12.37 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 61 | 89.40 | 2026-05-06 |
| CorpFin v2 | Finance | 16 | 65.307% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 2 | 63.331% | 2026-05-04 |
| Finance Agent v2 | Finance | 5 | 51.035% | 2026-05-28 |
| MortgageTax | Finance | 16 | 67.726% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 3 | 59% rubric / 38% final | 2026-05-28 |
| TaxBench | Finance | 12 | 11.20% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 2 | 77.106% | 2026-05-28 |
| React Native Evals | Frontend Development | 8 | 80.6227% overall | 2026-05-28 |
| InfiniteBM Chess | Game | 3 | 1190.33 Elo / 11 games | 2026-05-28 |
| InfiniteBM Coup | Game | 2 | 1549.3 Elo / 34 games | 2026-05-28 |
| InfiniteBM Coup | Game | 8 | 519.02 Elo / 6 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 3 | 1485.1 Elo / 20 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 13 | 1251.34 Elo / 209 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 14 | 1267.56 Elo / 6613 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 23 | 1170.63 Elo / 41 games | 2026-05-28 |
| InfiniteBM Settlers of Catan | Game | 2 | 1805.89 Elo / 24 games | 2026-05-28 |
| InfiniteBM Werewolf | Game | 6 | 1137.69 Elo / 22 games | 2026-05-28 |
| InfiniteBM Werewolf | Game | 11 | 889.31 Elo / 19 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 20 | 32.28 | 2026-05-06 |
| BenchLM | General Knowledge | 15 | 83 | 2026-05-06 |
| HealthBench Professional | Healthcare | 3 | 41.7% | 2026-05-28 |
| MedQA | Healthcare | 37 | 92.058% | 2026-04-16 |
| PhysicianBench | Healthcare | 5 | 23.0 +/- 2.6 | 2026-05-27 |
| Artificial Analysis Intelligence Index | Intelligence | 15 | 51.72 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 46 | 44.38 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 57 | 42.6 | 2026-05-11 |
| GPQA Diamond | Intelligence | 23 | 85.606% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 24 | 30% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 109 | 13.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 140 | 10.8% | 2026-05-11 |
| MMLU Pro | Intelligence | 15 | 87.341% | 2026-05-28 |
| MMMU Pro | Intelligence | 15 | 83.584% | 2026-05-28 |
| Vals Index | Intelligence | 5 | 60.296% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 5 | 60.783% | 2026-05-28 |
| CaseLaw v2 | Legal | 14 | 63.987% | 2026-05-04 |
| Harvey Legal Agent Benchmark | Legal | 2 | 5.4% | 2026-05-28 |
| LegalBench | Legal | 43 | 82.12% | 2026-05-28 |
| AIME | Math | 20 | 92.292% | 2026-04-16 |
| ProofBench | Math | 7 | 45% | 2026-05-28 |
| Global MMLU | Multilingual | 5 | 86.1% | 2026-05-28 |
| ALL Bench Multimodal | Multimodal | 16 | 32.53 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 7 | 17.93 | 2026-05-06 |
| Blueprint-Bench 2 | Multimodal | 8 | 0.570 +/- 0.011 | 2026-05-28 |
| Design Arena | Multimodal | 8 | 1331 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 8 | 80.68 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 12 | 1277.89 | 2026-05-06 |
| ARC-AGI v2 | Reasoning | 5 | 0.58 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 11 | 32.6 | 2026-05-27 |
| Context Arena | Reasoning | 8 | 70.50 | 2026-05-06 |
| Context Arena | Reasoning | 9 | 70.38 | 2026-05-06 |
| Context Arena | Reasoning | 10 | 69.61 | 2026-05-06 |
| Context Arena | Reasoning | 28 | 46.73 | 2026-05-06 |
| GPQA Diamond | Reasoning | 29 | 87.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 102 | 79.9% | 2026-05-11 |
| GPQA Diamond | Reasoning | 103 | 79.7% | 2026-05-11 |
| CAIS Risk Index | Safety | 5 | 38.8 | 2026-05-27 |
| HarmActionsEval | Safety | 3 | 2.84 | 2026-05-06 |
| LiveSecBench | Safety | 2 | 85.97 | 2026-05-27 |
| BioMysteryBench Human-Difficult | Science | 4 | 19.1% | 2026-05-28 |
| BioMysteryBench Human-Difficult | Science | 4 | 19.1% | 2026-04-29 |
| BioMysteryBench Human-Solvable | Science | 4 | 71.8% | 2026-05-28 |
| BioMysteryBench Human-Solvable | Science | 4 | 71.8% | 2026-04-29 |
| CritPt | Science | 44 | 3.1% | 2026-05-11 |
| CritPt | Science | 91 | 0.9% | 2026-05-11 |
| CritPt | Science | 92 | 0.9% | 2026-05-11 |
| ProgramBench | Software Engineering | 3 | 0% | 2026-05-05 |
| SWE-PRBench | Software Engineering | 2 | 0.152 | 2026-05-27 |
| Structured Output Benchmark | Structured Output | 11 | 85.40 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 21 | 47.7 | 2026-05-27 |
No matching rows.