Claude Sonnet 4.5
Claude / Anthropic
111scores
84benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-4.5-sonnet-20250929, anthropic-claude-sonnet-4.5, anthropic/claude-4.5-sonnet-20250929, anthropic/claude-sonnet-4.5, claude-4.5-sonnet-20250929, claude-sonnet-4.5
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 47 | 63.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 67 | 48.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 69 | 46.50 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 91 | 31 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 102 | 25.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 46 | 13.61 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 56 | 6.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 57 | 6.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 63 | 5.83 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 77 | 3.75 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 2 | 73.24% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 89 | 24.9% | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 7 | 30.5% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 13 | 0.57 | 2026-05-11 |
| LLM-WikiRace | Agentic | 13 | 43.30 | 2026-05-06 |
| MCP Atlas | Agentic | 12 | 59.50 | 2026-05-06 |
| MCPMark | Agentic | 9 | 0.32 | 2026-05-06 |
| MultiChallenge | Agentic | 12 | 55.32 | 2026-05-06 |
| OSWorld | Agentic | 19 | 62.88% | 2026-05-27 |
| OSWorld | Agentic | 25 | 58.08% | 2026-05-27 |
| OSWorld | Agentic | 50 | 42.88% | 2026-05-27 |
| PinchBench | Agentic | 11 | 0.89 | 2026-05-06 |
| Poker Agent | Agentic | 8 | 1055.504% | 2025-12-23 |
| RuneBench | Agentic | 11 | 3.20 | 2026-05-05 |
| UAVBench | Agentic | 28 | 58.40 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 17 | 3838.74 | 2026-05-28 |
| AgentBench FC | Agents | 6 | 58.90 | 2026-05-06 |
| AgentBench FC | Agents | 7 | 58.30 | 2026-05-06 |
| OpenUGI | Alignment | 297 | 42.39 | 2026-05-06 |
| OpenUGI | Alignment | 848 | 29.44 | 2026-05-06 |
| scBench | Biology | 14 | 33.16% | 2026-05-27 |
| SpatialBench | Biology | 12 | 41.51% | 2026-05-27 |
| ABC-Bench | Coding | 1 | 63.2% +/- 1.9 | 2026-05-27 |
| Arena AI Code | Coding | 40 | 1386 | 2026-05-06 |
| ContextBench | Coding | 1 | 53 | 2026-05-06 |
| IOI | Coding | 17 | 18.334% | 2026-05-26 |
| LiveCodeBench | Coding | 56 | 72.996% | 2026-05-28 |
| SWE-bench Verified | Coding | 30 | 70% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 29 | 41.573% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 21 | 22.621% | 2026-05-28 |
| VibeCodingBench | Coding | 6 | 88.56 | 2026-05-06 |
| RP-Bench | Creative | 4 | 1541.10 | 2026-05-06 |
| RP-Bench | Creative | 10 | 1497.30 | 2026-05-06 |
| RP-Bench | Creative | 20 | 4.37 | 2026-05-06 |
| SecCodeBench | Cybersecurity | 13 | 56.83% | 2026-05-28 |
| Arena AI Document | Document AI | 12 | 1450 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 16 | 66.04 | 2026-05-06 |
| IslamicLegalBench | Domain | 3 | 65.63 | 2026-05-06 |
| SAGE | Education | 36 | 36.065% | 2026-05-28 |
| SAGE | Education | 40 | 32.88% | 2026-05-28 |
| TutorBench | Education | 16 | 49 | 2026-05-06 |
| TutorBench | Education | 21 | 45.70 | 2026-05-06 |
| From Perception to Action | Embodied AI | 4 | 13.8% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 75 | 88 | 2026-05-06 |
| CorpFin v2 | Finance | 31 | 61.966% | 2026-05-28 |
| CorpFin v2 | Finance | 43 | 60.8% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 16 | 54.5% | 2026-05-04 |
| FinChain | Finance | 2 | 58.22 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 31 | 63.99% | 2026-05-28 |
| PRBench Finance | Finance | 12 | 43.79 | 2026-05-06 |
| TaxBench | Finance | 15 | 8.03% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 32 | 73.303% | 2026-05-28 |
| MageBench Season 1 | Game | 20 | 1589 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 22 | 30.42 | 2026-05-06 |
| BenchLM | General Knowledge | 37 | 66 | 2026-05-06 |
| MedCode | Healthcare | 20 | 44.134% | 2026-05-28 |
| MedCode | Healthcare | 26 | 40.569% | 2026-05-28 |
| MedQA | Healthcare | 15 | 94.708% | 2026-04-16 |
| MedScribe | Healthcare | 9 | 84.515% | 2026-05-28 |
| MedScribe | Healthcare | 11 | 84.101% | 2026-05-28 |
| PlaceboBench | Healthcare | 3 | 62.3188 | 2026-05-27 |
| HUMAINE | Human Preference | 27 | 3.49 | 2026-05-06 |
| GPQA Diamond | Intelligence | 35 | 81.633% | 2026-05-28 |
| MathVision | Intelligence | 27 | 71.10 | 2026-05-06 |
| MMLU Pro | Intelligence | 14 | 87.357% | 2026-05-28 |
| MMMU Pro | Intelligence | 29 | 79.306% | 2026-05-28 |
| AraGen v3 | Language | 6 | 78.17 | 2026-05-06 |
| Seneca-TRBench | Language | 5 | 88.78 | 2026-05-06 |
| CaseLaw v2 | Legal | 19 | 62.165% | 2026-05-04 |
| LegalBench | Legal | 20 | 84.084% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 13 | 40.84 | 2026-05-06 |
| ConStory-Bench | Long Context | 4 | CED 0.52 | 2026-05-28 |
| AIME | Math | 30 | 88.19% | 2026-04-16 |
| MGSM | Math | 4 | 94.327% | 2026-01-09 |
| ProofBench | Math | 15 | 19% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 10 | 13.49 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 13 | 77.78 | 2026-05-06 |
| Medmarks | Medical | 6 | 0.49977366098330706 | 2026-05-27 |
| Medmarks | Medical | 4 | 0.6257561642171057 | 2026-05-27 |
| MedSafe-Dx | Medical | 6 | 87.2 | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 17 | 30.89 | 2026-05-06 |
| Design Arena | Multimodal | 32 | 1242 | 2026-05-06 |
| Design Arena | Multimodal | 33 | 1240 | 2026-05-06 |
| MMMU-Pro | Multimodal | 23 | 68.90 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 9 | 48.75 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 26 | 45 | 2026-05-06 |
| VTB | Multimodal | 10 | 6.20 | 2026-05-06 |
| VTB | Multimodal | 11 | 5.60 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 18 | 25.4 | 2026-05-27 |
| EnigmaEval | Reasoning | 13 | 6.00 | 2026-05-06 |
| EnigmaEval | Reasoning | 20 | 3.38 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 20 | 14.09 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 34 | 7.65 | 2026-05-06 |
| MultiNRC | Reasoning | 16 | 35.83 | 2026-05-06 |
| MultiNRC | Reasoning | 21 | 28.15 | 2026-05-06 |
| CAIS Risk Index | Safety | 2 | 34.1 | 2026-05-27 |
| InvisibleBench | Safety | 3 | 0.04 | 2026-05-06 |
| SciPredict | Science | 2 | 22.55 | 2026-05-06 |
| IDE-Bench | Software Engineering | 1 | 87.5 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 12 | 30.46 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 22 | 46.2 | 2026-05-27 |
No matching rows.