Claude Opus 4.5
Claude / Anthropic
121scores
88benchmarks
$5 / $25 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-4.5-opus-20251124, anthropic-claude-opus-4.5, anthropic/claude-4.5-opus-20251124, anthropic/claude-opus-4.5, claude-4.5-opus-20251124, claude-opus-4.5
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ALFWorld | Agentic | 1 | 1.0 | 2026-05-27 |
| APEX-Agents | Agentic | 15 | 34.80 | 2026-05-06 |
| Berkeley Function-Calling Leaderboard | Agentic | 1 | 77.47% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 57 | 33.47% | 2026-05-27 |
| CAR-bench | Agentic | 4 | 0.52 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 3 | 37% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 7 | 0.63 | 2026-05-11 |
| LLM-WikiRace | Agentic | 6 | 56 | 2026-05-06 |
| MCP Atlas | Agentic | 6 | 69.80 | 2026-05-06 |
| MCPMark | Agentic | 7 | 0.42 | 2026-05-06 |
| MultiChallenge | Agentic | 10 | 58.97 | 2026-05-06 |
| PinchBench | Agentic | 16 | 0.87 | 2026-05-06 |
| Poker Agent | Agentic | 11 | 1033.379% | 2025-12-23 |
| RuneBench | Agentic | 9 | 4.10 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 59 | 89.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 78 | 86.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 15 | 47% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 35 | 40.9% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 13 | 4967.06 | 2026-05-28 |
| OpenUGI | Alignment | 294 | 42.64 | 2026-05-06 |
| OpenUGI | Alignment | 600 | 34.23 | 2026-05-06 |
| scBench | Biology | 10 | 47.18% | 2026-05-27 |
| SpatialBench | Biology | 11 | 42.77% | 2026-05-27 |
| Arena AI Code | Coding | 12 | 1467 | 2026-05-06 |
| HoudiniVexBench | Coding | 1 | 0.51 | 2026-05-06 |
| IOI | Coding | 11 | 23.584% | 2026-05-26 |
| IOI | Coding | 15 | 20.25% | 2026-05-26 |
| LiveCodeBench | Coding | 28 | 83.67% | 2026-05-28 |
| LiveCodeBench | Coding | 54 | 75.034% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 12 | 1467.21 | 2026-05-06 |
| SciCode | Coding | 23 | 49.5% | 2026-05-11 |
| SciCode | Coding | 28 | 47% | 2026-05-11 |
| SWE-bench Verified | Coding | 11 | 76.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 10 | 58.427% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 16 | 53.933% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 23 | 20.63% | 2026-05-28 |
| VibeCodingBench | Coding | 1 | 89.15 | 2026-05-06 |
| Arena AI Document | Document AI | 9 | 1470 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 8 | 69.64 | 2026-05-06 |
| IB-bench | Domain Specific | 1 | 30.50 | 2026-05-06 |
| SAGE | Education | 4 | 52.092% | 2026-05-28 |
| SAGE | Education | 18 | 45.002% | 2026-05-28 |
| TutorBench | Education | 11 | 51.20 | 2026-05-06 |
| TutorBench | Education | 14 | 49.82 | 2026-05-06 |
| From Perception to Action | Embodied AI | 3 | 15.6% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 66 | 89.10 | 2026-05-06 |
| CorpFin v2 | Finance | 19 | 65.074% | 2026-05-28 |
| CorpFin v2 | Finance | 34 | 61.305% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 8 | 58.81% | 2026-05-04 |
| MortgageTax | Finance | 9 | 68.68% | 2026-05-28 |
| MortgageTax | Finance | 17 | 67.686% | 2026-05-28 |
| PRBench Finance | Finance | 8 | 46.16 | 2026-05-06 |
| TaxEval v2 | Finance | 14 | 74.856% | 2026-05-28 |
| TaxEval v2 | Finance | 21 | 74.325% | 2026-05-28 |
| BenchLM | General Knowledge | 23 | 77 | 2026-05-06 |
| WeirdML | Generalization | 3 | 63.70 | 2026-05-06 |
| MedCode | Healthcare | 12 | 49.156% | 2026-05-28 |
| MedCode | Healthcare | 19 | 45.174% | 2026-05-28 |
| MedQA | Healthcare | 10 | 95.875% | 2026-04-16 |
| MedQA | Healthcare | 24 | 93.158% | 2026-04-16 |
| MedScribe | Healthcare | 7 | 85.321% | 2026-05-28 |
| MedScribe | Healthcare | 13 | 83.246% | 2026-05-28 |
| Omi SOAP Note Safety Benchmark | Healthcare | 5 | 4.54 | 2026-04-21 |
| HUMAINE | Human Preference | 44 | 3.25 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 10 | 123 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 23 | 49.73 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 53 | 43.09 | 2026-05-11 |
| GPQA Diamond | Intelligence | 22 | 85.859% | 2026-05-28 |
| GPQA Diamond | Intelligence | 43 | 79.546% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 29 | 28.4% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 113 | 12.9% | 2026-05-11 |
| MathVision | Intelligence | 20 | 77.10 | 2026-05-06 |
| MMLU Pro | Intelligence | 17 | 87.26% | 2026-05-28 |
| MMLU Pro | Intelligence | 33 | 85.59% | 2026-05-28 |
| MMLU-Pro | Intelligence | 2 | 89.5% | 2026-05-11 |
| MMLU-Pro | Intelligence | 5 | 88.9% | 2026-05-11 |
| MMMU Pro | Intelligence | 19 | 82.948% | 2026-05-28 |
| MMMU Pro | Intelligence | 24 | 81.098% | 2026-05-28 |
| AraGen v3 | Language | 5 | 80.29 | 2026-05-06 |
| CaseLaw v2 | Legal | 18 | 62.594% | 2026-05-04 |
| LegalBench | Legal | 13 | 84.604% | 2026-05-28 |
| LegalBench | Legal | 35 | 82.837% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 9 | 44.21 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 17 | 37.50 | 2026-05-06 |
| AIME | Math | 12 | 95.417% | 2026-04-16 |
| AIME | Math | 50 | 76.875% | 2026-04-16 |
| AIME 2025 | Math | 20 | 91.3% | 2026-05-11 |
| AIME 2025 | Math | 112 | 62.7% | 2026-05-11 |
| MGSM | Math | 1 | 95.2% | 2026-01-09 |
| MGSM | Math | 2 | 94.764% | 2026-01-09 |
| ProofBench | Math | 8 | 36% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 6 | 20.69 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 6 | 4.17 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 6 | 86.11 | 2026-05-06 |
| Medical Chronology LLM Benchmark | Medical | 3 | 0.91 | 2026-05-06 |
| Design Arena | Multimodal | 14 | 1300 | 2026-05-06 |
| MMMU-Pro | Multimodal | 18 | 73.90 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 19 | 46.43 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 24 | 45.32 | 2026-05-06 |
| VPCT | Multimodal | 8 | 40 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 195 | 11.11 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 196 | 11.11 | 2026-05-11 |
| ARC-AGI v2 | Reasoning | 9 | 0.38 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 8 | 36.6 | 2026-05-27 |
| EnigmaEval | Reasoning | 6 | 11.91 | 2026-05-06 |
| EnigmaEval | Reasoning | 16 | 4.65 | 2026-05-06 |
| GPQA Diamond | Reasoning | 39 | 86.6% | 2026-05-11 |
| GPQA Diamond | Reasoning | 95 | 81% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 8 | 26.32 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 20 | 13.90 | 2026-05-06 |
| MultiNRC | Reasoning | 7 | 48.63 | 2026-05-06 |
| MultiNRC | Reasoning | 12 | 41.23 | 2026-05-06 |
| SimpleBench | Reasoning | 3 | 62 | 2026-05-06 |
| CAIS Risk Index | Safety | 3 | 34.7 | 2026-05-27 |
| CritPt | Science | 36 | 4.6% | 2026-05-11 |
| CritPt | Science | 124 | 0.3% | 2026-05-11 |
| GSO-Bench | Science | 2 | 26.50 | 2026-05-06 |
| SciPredict | Science | 1 | 23.05 | 2026-05-06 |
| IDE-Bench | Software Engineering | 3 | 83.75 | 2026-05-27 |
| CAIS Vision Capabilities Index | Vision | 23 | 44.9 | 2026-05-27 |
| Lech Mazur Writing | Writing | 6 | 8.54 | 2026-05-06 |
No matching rows.