Claude 3.5 Sonnet

Claude / Anthropic

82scores
67benchmarks
$3 / $15 per 1M tokenscost in/out

Metadata

Claude Closed/API

Aliases: claude-3.5-sonnet, claude-3.5-sonnet-new, claude-3-5-sonnet-20241022, anthropic-claude-3-5-sonnet-20241022, anthropic/claude-3-5-sonnet-20241022

Benchmark Results

Benchmark Category Rank Score Sampled
AgentIF Agentic 8 56.6 2026-05-27
Clembench Multimodal v1.6.5 Agentic 1 80.77 2026-05-06
WildAgtEval Agentic 7 55.8% 2026-05-28
LAB-Bench Biology 1 0.266667 2026-05-27
TextClass Benchmark Classification 94 1384.79 2026-05-06
Aider Refactoring Benchmark Coding 1 92.10 2026-05-06
Aider Refactoring Benchmark Coding 4 64 2026-05-06
BigCodeBench Coding 17 46.80 2026-05-06
BigCodeBench Coding 33 44.60 2026-05-06
LiveCodeBench Coding 23 36.40 2026-05-06
LiveCodeBench Coding 88 49.628% 2026-05-28
Long Code Arena Coding 2 0.84 2026-05-06
SciCode Coding 163 36.6% 2026-05-11
SciCode Coding 232 31.6% 2026-05-11
MMDocBench Document Understanding 4 69.25% 2026-05-27
GSMA Open Telco Leaderboard Domain 29 60.87 2026-05-06
RoboBench Embodied 9 37.82 2026-05-27
BizFinBench Finance 14 65.59 2026-05-27
CorpFin v2 Finance 73 53.613% 2026-05-28
FinEval Finance 12 72.9 2026-05-27
MortgageTax Finance 30 64.07% 2026-05-28
TaxEval v2 Finance 66 70.156% 2026-05-28
BenchLM General Knowledge 77 41 2026-05-06
AgentHarm Generalization 7 13.5% 2026-05-27
AgentHarm Generalization 15 26.9% 2026-05-27
AgentHarm Generalization 31 68.7% 2026-05-27
Arena-Hard Generalization 19 33.0% 2026-05-27
HELM AIR-Bench Generalization 2 0.908325 2026-05-28
HELM AIR-Bench Generalization 11 0.858974 2026-05-28
HELM Safety Generalization 3 0.976697 2026-05-28
WildBench Generalization 7 7.7265625 2026-05-27
HELM MedQA Healthcare 7 0.864811 2026-05-28
MedQA Healthcare 64 83.191% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 305 15.93 2026-05-11
Artificial Analysis Intelligence Index Intelligence 345 14.17 2026-05-11
GPQA Diamond Intelligence 84 59.344% 2026-05-28
HELM Lite Intelligence 2 0.912171 2026-05-28
Humanity's Last Exam Intelligence 422 3.9% 2026-05-11
Humanity's Last Exam Intelligence 438 3.7% 2026-05-11
MathVision Intelligence 83 37.99 2026-05-06
MathVista Intelligence 17 67.70 2026-05-06
MMLU Pro Intelligence 75 78.404% 2026-05-28
MMLU-Pro Intelligence 153 77.2% 2026-05-11
MMLU-Pro Intelligence 176 75.1% 2026-05-11
MMMU Pro Intelligence 53 68.804% 2026-05-28
SimpleQA Intelligence 11 28.9% 2026-05-27
SuperGPQA Intelligence 9 48.16 2026-05-06
HindiGen v1 Language 3 77.47 2026-05-06
AIME Math 89 10% 2026-04-16
MATH 500 Math 50 72.4% 2026-01-09
MGSM Math 15 92.582% 2026-01-09
Omni-MATH Math 9 26.23 2026-05-06
MedHELM Medical 4 0.6339285714285714 2026-05-27
BenchBench Meta 4 0.96 2026-05-06
LanguageBench Multilingual 2 0.68 2026-05-06
ChartQA Multimodal 1 0.91 2026-05-06
MMMU-Pro Multimodal 36 51.50 2026-05-06
Physical AI Bench Understanding Multimodal 25 46 2026-05-06
Video SimpleQA Multimodal 13 34 2026-05-06
Video-MME Multimodal 32 62.90 2026-05-06
Visual-Language Understanding Multimodal 41 38.72 2026-05-06
Visual-Language Understanding Multimodal 43 38.37 2026-05-06
DROP Reasoning 2 0.87 2026-05-06
DROP Reasoning 2 0.87 2026-05-06
EnigmaEval Reasoning 35 0.91 2026-05-06
GPQA Diamond Reasoning 280 59.9% 2026-05-11
GPQA Diamond Reasoning 308 56% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 48 4.32 2026-05-06
LingOly-TOO Reasoning 8 0.28 2026-05-06
ZebraLogic Reasoning 12 36.20 2026-05-06
ZebraLogic Reasoning 13 33.40 2026-05-06
AgentLeak Safety 1 55.20 2026-05-06
X-Risks Leaderboard Safety 8 14.45 2026-05-06
MaCBench Science 2 0.67 2026-05-06
SciKnowEval Science 1 1 2026-05-27
PaperBench Self Improvement 1 21.0% 2025-04-02
Defects4J Software Engineering 6 0.441 2026-05-27
Defects4J Software Engineering 9 0.415 2026-05-27
RepairBench Software Engineering 5 0.418 2026-05-27
RepairBench Software Engineering 9 0.391 2026-05-27
VNTL Leaderboard Translation 6 72.80 2026-05-06
CG-Bench Video 2 35.6% open-ended acc. / 40.3% MCQ long acc. 2026-05-28