Claude 3.7 Sonnet
Claude / Anthropic
83scores
68benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-3-7-sonnet-20250219, anthropic-claude-3.7-sonnet, anthropic/claude-3-7-sonnet-20250219, anthropic/claude-3.7-sonnet, claude-3-7-sonnet-20250219, claude-3.7-sonnet
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ALFWorld | Agentic | 8 | 0.833 | 2026-05-27 |
| MCP-Universe | Agentic | 14 | 24.24 | 2026-05-06 |
| OSWorld | Agentic | 62 | 35.8% | 2026-05-27 |
| OSWorld | Agentic | 63 | 35.6% | 2026-05-27 |
| OSWorld | Agentic | 80 | 27.1% | 2026-05-27 |
| Tau2-Bench Telecom | Agentic | 177 | 50% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 138 | 21.2% | 2026-05-11 |
| WildAgtEval | Agentic | 5 | 61.6% | 2026-05-28 |
| OpenUGI | Alignment | 515 | 36.38 | 2026-05-06 |
| OpenUGI | Alignment | 675 | 33.02 | 2026-05-06 |
| TextClass Benchmark | Classification | 69 | 1500.76 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 4 | 32.40 | 2026-05-05 |
| BigCodeBench-Hard | Coding | 5 | 31.80 | 2026-05-05 |
| CadEval | Coding | 5 | 54 | 2026-05-06 |
| LiveCodeBench | Coding | 82 | 56.662% | 2026-05-28 |
| Natural Language to Mongosh | Coding | 2 | 0.89 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 3 | 0.88 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 4 | 0.87 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 5 | 0.87 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 6 | 0.87 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 8 | 0.86 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 9 | 0.86 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 15 | 0.86 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 16 | 0.86 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 22 | 0.85 | 2026-05-06 |
| Natural Language to Mongosh | Coding | 28 | 0.84 | 2026-05-06 |
| SciCode | Coding | 142 | 37.6% | 2026-05-11 |
| AIRTBench | Cybersecurity | 1 | 46.86 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 17 | 65.56 | 2026-05-06 |
| K-12EduBench | Education | 17 | 61.20 | 2026-05-27 |
| RoboBench | Embodied | 6 | 40.53 | 2026-05-27 |
| FinEval | Finance | 29 | 62.9 | 2026-05-27 |
| MortgageTax | Finance | 8 | 68.68% | 2026-05-28 |
| TaxEval v2 | Finance | 40 | 72.404% | 2026-05-28 |
| HELM AIR-Bench | Generalization | 21 | 0.817703 | 2026-05-28 |
| HELM Safety | Generalization | 18 | 0.944914 | 2026-05-28 |
| WeirdML | Generalization | 15 | 39.97 | 2026-05-06 |
| GeoCode Leaderboard | Geospatial | 4 | 70.35% pass@1 | 2026-05-28 |
| OmniEarth-Bench | Geospatial | 4 | 29.07 | 2026-05-27 |
| HELM MedQA | Healthcare | 8 | 0.856859 | 2026-05-28 |
| HUMAINE | Human Preference | 31 | 3.40 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 142 | 30.81 | 2026-05-11 |
| GPQA Diamond | Intelligence | 74 | 67.424% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 322 | 4.8% | 2026-05-11 |
| MathVision | Intelligence | 43 | 58.60 | 2026-05-06 |
| MMLU Pro | Intelligence | 57 | 80.663% | 2026-05-28 |
| MMLU-Pro | Intelligence | 110 | 80.3% | 2026-05-11 |
| MMMU Pro | Intelligence | 48 | 71.519% | 2026-05-28 |
| AraGen v3 | Language | 7 | 78.16 | 2026-05-06 |
| HindiGen v1 | Language | 12 | 70.77 | 2026-05-06 |
| WinoGrande | Language | 17 | 75.10 | 2026-05-06 |
| LegalBench | Legal | 60 | 80.001% | 2026-05-28 |
| LEXam | Legal | 3 | 62.86% open / 57.23% MCQ | 2026-05-28 |
| Fiction.LiveBench | Long Context | 13 | 53.10 | 2026-05-06 |
| AIME | Math | 79 | 22.292% | 2026-04-16 |
| AIME 2025 | Math | 208 | 21% | 2026-05-11 |
| IneqMath | Math | 45 | 2 | 2026-05-06 |
| IneqMath | Math | 50 | 1 | 2026-05-06 |
| MATH 500 | Math | 43 | 76.8% | 2026-01-09 |
| MGSM | Math | 19 | 92.4% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 17 | 4.14 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 12 | 0 | 2026-05-06 |
| MATH-500 | Mathematics | 14 | 0.96 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 18 | 57.78 | 2026-05-06 |
| LiveMedBench | Medical | 11 | 0.1699 | 2026-05-27 |
| MedHELM | Medical | 3 | 0.6357142857142857 | 2026-05-27 |
| AfroBench-Lite | Multilingual | 11 | 60.26 | 2026-05-06 |
| LanguageBench | Multilingual | 3 | 0.68 | 2026-05-06 |
| Design Arena | Multimodal | 37 | 1235 | 2026-05-06 |
| Video SimpleQA | Multimodal | 9 | 36.20 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 34 | 43.02 | 2026-05-06 |
| VPCT | Multimodal | 9 | 39 | 2026-05-06 |
| Balrog | Reasoning | 5 | 32.60 | 2026-05-06 |
| EnigmaEval | Reasoning | 25 | 2.26 | 2026-05-06 |
| GPQA Diamond | Reasoning | 245 | 65.6% | 2026-05-11 |
| LingOly-TOO | Reasoning | 3 | 0.43 | 2026-05-06 |
| SimpleBench | Reasoning | 7 | 46.40 | 2026-05-06 |
| CritPt | Science | 160 | 0% | 2026-05-11 |
| GSO-Bench | Science | 7 | 4.60 | 2026-05-06 |
| Defects4J | Software Engineering | 3 | 0.478 | 2026-05-27 |
| RepairBench | Software Engineering | 4 | 0.44 | 2026-05-27 |
| LiveSQLBench | Text to SQL | 21 | 25.75 | 2026-05-06 |
| Lech Mazur Writing | Writing | 13 | 8.11 | 2026-05-06 |
No matching rows.