Claude 3.7 Sonnet (thinking)

Claude / Anthropic

17scores
17benchmarks
$3 / $15 per 1M tokenscost in/out

Metadata

Claude Closed/API

Aliases: anthropic-claude-3.7-sonnet-thinking, anthropic/claude-3.7-sonnet:thinking, claude-3.7-sonnet:thinking

Benchmark Results

Benchmark Category Rank Score Sampled
MultiChallenge Agentic 16 51.58 2026-05-06
Tau2-Bench Telecom Agentic 166 54.7% 2026-05-11
Terminal-Bench Hard Agentic 139 21.2% 2026-05-11
SciCode Coding 93 40.3% 2026-05-11
TutorBench Education 21 46.45 2026-05-06
Artificial Analysis Intelligence Index Intelligence 110 34.71 2026-05-11
Humanity's Last Exam Intelligence 147 10.3% 2026-05-11
MMLU-Pro Intelligence 46 83.7% 2026-05-11
AIME 2025 Math 126 56.3% 2026-05-11
MMSI-Bench Multimodal 18 30.2% 2026-05-28
Visual-Language Understanding Multimodal 12 48.23 2026-05-06
EnigmaEval Reasoning 17 4.23 2026-05-06
GPQA Diamond Reasoning 131 77.2% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 33 7.89 2026-05-06
MultiNRC Reasoning 21 27.77 2026-05-06
CritPt Science 90 0.9% 2026-05-11
LiveSQLBench Text to SQL 19 26.55 2026-05-06