Claude Opus 4.8
Claude / Anthropic
83scores
77benchmarks
$5 / $25 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-opus-4.8, anthropic/claude-opus-4.8, claude-opus-4.8, Opus 4.8, Claude Opus 4.8
Official Sources
2 linked sources| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AutomationBench | Agentic | 1 | 15.5% | 2026-05-28 |
| BrowseComp | Agentic | 3 | 84.3% | 2026-05-28 |
| DRACO | Agentic | 1 | 80.4% | 2026-05-28 |
| GDPval-AA | Agentic | 1 | 1890 Elo | 2026-05-28 |
| MCP Atlas | Agentic | 1 | 82.2% | 2026-05-28 |
| OSWorld-Verified | Agentic | 1 | 83.4% | 2026-05-28 |
| ScreenSpot-Pro | Agentic | 1 | 87.9% | 2026-05-28 |
| Toolathlon | Agentic | 1 | 59.9% | 2026-05-28 |
| Vending-Bench 2 | Agentic | 8 | 5787.43 | 2026-05-28 |
| Vending-Bench 2 | Agentic | 22 | 2992.34 | 2026-05-28 |
| Vending-Bench 2 | Agentic | 3 | 5787.4 USD | 2026-05-28 |
| Vending-Bench 2 | Agentic | 4 | 2992.3 USD | 2026-05-28 |
| AAV Capsid Packaging Prediction | Biology | 2 | 0.8 | 2026-05-28 |
| BioPipelineBench Verified | Biology | 2 | 87.7% | 2026-05-28 |
| Black-box RNA Sequence Design | Biology | 2 | 10.1 | 2026-05-28 |
| DNA Synthesis Screening Evasion | Biology | 2 | 0.7 | 2026-05-28 |
| LABBench2 Clinical Trials | Biology | 2 | 85.3% | 2026-05-28 |
| LABBench2 Patent Questions | Biology | 1 | 68.8% | 2026-05-28 |
| LABBench2 Reading Tables | Biology | 1 | 77.2% | 2026-05-28 |
| LABBench2 Supplementary Materials | Biology | 1 | 58.9% | 2026-05-28 |
| Long-form Virology Task 1 | Biology | 2 | 0.8 | 2026-05-28 |
| Long-form Virology Task 2 | Biology | 2 | 0.9 | 2026-05-28 |
| ProteinGym Hard | Biology | 2 | 39.6% | 2026-05-28 |
| Protocol Troubleshooting (Anthropic Internal) | Biology | 2 | 59.6% | 2026-05-28 |
| scBench | Biology | 2 | 58.2% | 2026-05-28 |
| SpatialBench | Biology | 2 | 53.3% | 2026-05-28 |
| Structural Biology Open-Ended | Biology | 2 | 79% | 2026-05-28 |
| VCT Virology Capabilities Test | Biology | 2 | 0.5 | 2026-05-28 |
| Organic Chemistry (Anthropic Internal) | Chemistry | 2 | 86.2% | 2026-05-28 |
| FrontierSWE | Coding | 1 | 2.7 avg rank | 2026-05-28 |
| LiveCodeBench | Coding | 3 | 87.819% | 2026-05-28 |
| SWE-bench Verified | Coding | 1 | 88.6% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 2 | 70.037% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 3 | 71.91% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 2 | 74.6% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 1 | 82.725% | 2026-05-28 |
| CyberGym | Cybersecurity | 2 | 78.8% | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 3 | 5.45 points | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 4 | 5.02 points | 2026-05-28 |
| Firefox 147 JS Exploitation | Cybersecurity | 2 | 8.8% | 2026-05-28 |
| OfficeQA (Anthropic Harness) | Document AI | 1 | 77.6% | 2026-05-28 |
| OfficeQA Pro (Anthropic Harness) | Document AI | 1 | 66.2% | 2026-05-28 |
| SAGE | Education | 3 | 54.788% | 2026-05-28 |
| CorpFin v2 | Finance | 8 | 66.706% | 2026-05-28 |
| Finance Agent v2 | Finance | 2 | 53.918% | 2026-05-28 |
| Finance Agent v2 | Finance | 1 | 53.9% | 2026-05-28 |
| MortgageTax | Finance | 2 | 69.912% | 2026-05-28 |
| TaxEval v2 | Finance | 7 | 75.634% | 2026-05-28 |
| HealthBench Professional | Healthcare | 1 | 55.8% | 2026-05-28 |
| MedCode | Healthcare | 5 | 53.217% | 2026-05-28 |
| MedScribe | Healthcare | 6 | 85.755% | 2026-05-28 |
| GPQA Diamond | Intelligence | 4 | 92.424% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 1 | 57.9% | 2026-05-28 |
| MMLU Pro | Intelligence | 4 | 89.585% | 2026-05-28 |
| MMMU Pro | Intelligence | 9 | 86.59% | 2026-05-28 |
| Vals Index | Intelligence | 1 | 70.166% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 1 | 70.712% | 2026-05-28 |
| Harvey Legal Agent Benchmark | Legal | 1 | 9.6% | 2026-05-28 |
| LegalBench | Legal | 27 | 83.568% | 2026-05-28 |
| Graphwalks BFS 1M F1 | Long Context | 1 | 68.1% | 2026-05-28 |
| Graphwalks BFS 256k F1 | Long Context | 1 | 85.9% | 2026-05-28 |
| Graphwalks Parents 1M F1 | Long Context | 1 | 83.3% | 2026-05-28 |
| Graphwalks Parents 256k F1 | Long Context | 1 | 99.3% | 2026-05-28 |
| ProofBench | Math | 2 | 69% | 2026-05-28 |
| ArxivMath | Mathematics | 1 | 71.8% | 2026-05-28 |
| USAMO 2026 | Mathematics | 1 | 96.7% | 2026-05-28 |
| Global MMLU | Multilingual | 3 | 90.4% | 2026-05-28 |
| INCLUDE | Multilingual | 1 | 87.6% | 2026-05-28 |
| MILU | Multilingual | 1 | 90.3% | 2026-05-28 |
| Blueprint-Bench 2 | Multimodal | 7 | 0.606 +/- 0.010 | 2026-05-28 |
| ChartMuseum | Multimodal | 1 | 89.7% | 2026-05-28 |
| ChartQAPro | Multimodal | 1 | 72.3% | 2026-05-28 |
| CharXiv-R | Multimodal | 2 | 89.9% | 2026-05-28 |
| FigQA | Multimodal | 1 | 87.3% | 2026-05-28 |
| GPQA Diamond | Reasoning | 3 | 93.6% | 2026-05-28 |
| BioMysteryBench Human-Difficult | Science | 1 | 40% | 2026-05-28 |
| BioMysteryBench Human-Solvable | Science | 2 | 80.4% | 2026-05-28 |
| DeepSearchQA | Search | 2 | 93.1% | 2026-05-28 |
| ProgramBench (Anthropic Harness) | Software Engineering | 1 | 88% | 2026-05-28 |
| SWE-bench Multilingual | Software Engineering | 1 | 84.4% | 2026-05-28 |
| SWE-bench Multimodal | Software Engineering | 1 | 38.4% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 1 | 69.2% | 2026-05-28 |
| SWE-bench Verified | Software Engineering | 1 | 88.6% | 2026-05-28 |
No matching rows.