Claude 3.5 Sonnet
Claude / Anthropic
82scores
67benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: claude-3.5-sonnet, claude-3.5-sonnet-new, claude-3-5-sonnet-20241022, anthropic-claude-3-5-sonnet-20241022, anthropic/claude-3-5-sonnet-20241022
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AgentIF | Agentic | 8 | 56.6 | 2026-05-27 |
| Clembench Multimodal v1.6.5 | Agentic | 1 | 80.77 | 2026-05-06 |
| WildAgtEval | Agentic | 7 | 55.8% | 2026-05-28 |
| LAB-Bench | Biology | 1 | 0.266667 | 2026-05-27 |
| TextClass Benchmark | Classification | 94 | 1384.79 | 2026-05-06 |
| Aider Refactoring Benchmark | Coding | 1 | 92.10 | 2026-05-06 |
| Aider Refactoring Benchmark | Coding | 4 | 64 | 2026-05-06 |
| BigCodeBench | Coding | 17 | 46.80 | 2026-05-06 |
| BigCodeBench | Coding | 33 | 44.60 | 2026-05-06 |
| LiveCodeBench | Coding | 23 | 36.40 | 2026-05-06 |
| LiveCodeBench | Coding | 88 | 49.628% | 2026-05-28 |
| Long Code Arena | Coding | 2 | 0.84 | 2026-05-06 |
| SciCode | Coding | 163 | 36.6% | 2026-05-11 |
| SciCode | Coding | 232 | 31.6% | 2026-05-11 |
| MMDocBench | Document Understanding | 4 | 69.25% | 2026-05-27 |
| GSMA Open Telco Leaderboard | Domain | 29 | 60.87 | 2026-05-06 |
| RoboBench | Embodied | 9 | 37.82 | 2026-05-27 |
| BizFinBench | Finance | 14 | 65.59 | 2026-05-27 |
| CorpFin v2 | Finance | 73 | 53.613% | 2026-05-28 |
| FinEval | Finance | 12 | 72.9 | 2026-05-27 |
| MortgageTax | Finance | 30 | 64.07% | 2026-05-28 |
| TaxEval v2 | Finance | 66 | 70.156% | 2026-05-28 |
| BenchLM | General Knowledge | 77 | 41 | 2026-05-06 |
| AgentHarm | Generalization | 7 | 13.5% | 2026-05-27 |
| AgentHarm | Generalization | 15 | 26.9% | 2026-05-27 |
| AgentHarm | Generalization | 31 | 68.7% | 2026-05-27 |
| Arena-Hard | Generalization | 19 | 33.0% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 2 | 0.908325 | 2026-05-28 |
| HELM AIR-Bench | Generalization | 11 | 0.858974 | 2026-05-28 |
| HELM Safety | Generalization | 3 | 0.976697 | 2026-05-28 |
| WildBench | Generalization | 7 | 7.7265625 | 2026-05-27 |
| HELM MedQA | Healthcare | 7 | 0.864811 | 2026-05-28 |
| MedQA | Healthcare | 64 | 83.191% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 305 | 15.93 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 345 | 14.17 | 2026-05-11 |
| GPQA Diamond | Intelligence | 84 | 59.344% | 2026-05-28 |
| HELM Lite | Intelligence | 2 | 0.912171 | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 422 | 3.9% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 438 | 3.7% | 2026-05-11 |
| MathVision | Intelligence | 83 | 37.99 | 2026-05-06 |
| MathVista | Intelligence | 17 | 67.70 | 2026-05-06 |
| MMLU Pro | Intelligence | 75 | 78.404% | 2026-05-28 |
| MMLU-Pro | Intelligence | 153 | 77.2% | 2026-05-11 |
| MMLU-Pro | Intelligence | 176 | 75.1% | 2026-05-11 |
| MMMU Pro | Intelligence | 53 | 68.804% | 2026-05-28 |
| SimpleQA | Intelligence | 11 | 28.9% | 2026-05-27 |
| SuperGPQA | Intelligence | 9 | 48.16 | 2026-05-06 |
| HindiGen v1 | Language | 3 | 77.47 | 2026-05-06 |
| AIME | Math | 89 | 10% | 2026-04-16 |
| MATH 500 | Math | 50 | 72.4% | 2026-01-09 |
| MGSM | Math | 15 | 92.582% | 2026-01-09 |
| Omni-MATH | Math | 9 | 26.23 | 2026-05-06 |
| MedHELM | Medical | 4 | 0.6339285714285714 | 2026-05-27 |
| BenchBench | Meta | 4 | 0.96 | 2026-05-06 |
| LanguageBench | Multilingual | 2 | 0.68 | 2026-05-06 |
| ChartQA | Multimodal | 1 | 0.91 | 2026-05-06 |
| MMMU-Pro | Multimodal | 36 | 51.50 | 2026-05-06 |
| Physical AI Bench Understanding | Multimodal | 25 | 46 | 2026-05-06 |
| Video SimpleQA | Multimodal | 13 | 34 | 2026-05-06 |
| Video-MME | Multimodal | 32 | 62.90 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 41 | 38.72 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 43 | 38.37 | 2026-05-06 |
| DROP | Reasoning | 2 | 0.87 | 2026-05-06 |
| DROP | Reasoning | 2 | 0.87 | 2026-05-06 |
| EnigmaEval | Reasoning | 35 | 0.91 | 2026-05-06 |
| GPQA Diamond | Reasoning | 280 | 59.9% | 2026-05-11 |
| GPQA Diamond | Reasoning | 308 | 56% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 48 | 4.32 | 2026-05-06 |
| LingOly-TOO | Reasoning | 8 | 0.28 | 2026-05-06 |
| ZebraLogic | Reasoning | 12 | 36.20 | 2026-05-06 |
| ZebraLogic | Reasoning | 13 | 33.40 | 2026-05-06 |
| AgentLeak | Safety | 1 | 55.20 | 2026-05-06 |
| X-Risks Leaderboard | Safety | 8 | 14.45 | 2026-05-06 |
| MaCBench | Science | 2 | 0.67 | 2026-05-06 |
| SciKnowEval | Science | 1 | 1 | 2026-05-27 |
| PaperBench | Self Improvement | 1 | 21.0% | 2025-04-02 |
| Defects4J | Software Engineering | 6 | 0.441 | 2026-05-27 |
| Defects4J | Software Engineering | 9 | 0.415 | 2026-05-27 |
| RepairBench | Software Engineering | 5 | 0.418 | 2026-05-27 |
| RepairBench | Software Engineering | 9 | 0.391 | 2026-05-27 |
| VNTL Leaderboard | Translation | 6 | 72.80 | 2026-05-06 |
| CG-Bench | Video | 2 | 35.6% open-ended acc. / 40.3% MCQ long acc. | 2026-05-28 |
No matching rows.