Claude Opus 4.6
Claude / Anthropic
203scores
145benchmarks
$5 / $25 per 1M tokenscost in/out
Metadata
Claude Closed/API
Aliases: anthropic-claude-4.6-opus-20260205, anthropic-claude-opus-4.6, anthropic/claude-4.6-opus-20260205, anthropic/claude-opus-4.6, claude-4.6-opus-20260205, claude-opus-4.6, Opus-4.6 Max, Opus 4.6 Max, Claude Opus 4.6 Max, Claude Opus 4.6 (Max), anthropic/claude-opus-4.6-max
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ALFWorld | Agentic | 2 | 1.0 | 2026-05-27 |
| APEX-Agents | Agentic | 4 | 48.40 | 2026-05-06 |
| APEX-Agents | Agentic | 8 | 45.60 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 3 | 33% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 11 | 94 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 14 | 93 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 18 | 92 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 28 | 86 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 14 | 69.17 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 15 | 68.75 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 19 | 66.25 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 21 | 64.58 | 2026-05-05 |
| AutoBench | Agentic | 2 | 3.24 | 2026-05-06 |
| AutoLab | Agentic | 1 | 0.85 | 2026-05-06 |
| BrowseComp | Agentic | 4 | 83.7% | 2026-04-16 |
| CAR-bench | Agentic | 1 | 0.58 | 2026-05-06 |
| Claw-Eval-Live | Agentic | 1 | 66.7 | 2026-05-27 |
| CoWorkBench | Agentic | 1 | 68.2% | 2026-05-28 |
| EnterpriseOps-Gym | Agentic | 1 | 44.6% | 2026-05-05 |
| GDPval-AA | Agentic | 2 | 1606 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 6 | 0.63 | 2026-05-11 |
| HiL-Bench | Agentic | 3 | 24.33% | 2026-05-05 |
| LLM-WikiRace | Agentic | 4 | 56.70 | 2026-05-06 |
| MCP Atlas | Agentic | 2 | 75.8% | 2026-05-28 |
| MCP Atlas | Agentic | 2 | 76.80 | 2026-05-06 |
| MCP Atlas | Agentic | 2 | 75.8% | 2026-04-16 |
| MCPMark | Agentic | 4 | 56.7% | 2026-05-28 |
| MultiChallenge | Agentic | 12 | 56.02 | 2026-05-06 |
| MultiChallenge | Agentic | 28 | 37.15 | 2026-05-06 |
| OSWorld-Verified | Agentic | 4 | 72.7% | 2026-04-16 |
| PinchBench | Agentic | 1 | 0.93 | 2026-05-06 |
| QwenClawBench | Agentic | 1 | 65.5% | 2026-05-28 |
| QwenWorldBench | Agentic | 2 | 56.1% | 2026-05-28 |
| RealDataAgentBench | Agentic | 5 | 0.85 | 2026-04-28 |
| RuneBench | Agentic | 6 | 4.40 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 45 | 92.1% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 86 | 84.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 14 | 48.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 17 | 46.2% | 2026-05-11 |
| TERMS-Bench | Agentic | 1 | 69.4% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 3 | 56.8% | 2026-05-28 |
| Vending-Bench 2 | Agentic | 2 | 8017.59 | 2026-05-28 |
| WildClawBench | Agentic | 1 | 51.60 | 2026-05-06 |
| OpenUGI | Alignment | 13 | 60.41 | 2026-05-06 |
| OpenUGI | Alignment | 48 | 54.23 | 2026-05-06 |
| OpenUGI | Alignment | 69 | 52.20 | 2026-05-06 |
| OpenUGI | Alignment | 79 | 51.57 | 2026-05-06 |
| scBench | Biology | 7 | 52.65% | 2026-05-27 |
| SpatialBench | Biology | 4 | 52.83% | 2026-05-27 |
| Arena AI Code | Coding | 3 | 1548 | 2026-05-06 |
| Arena AI Code | Coding | 4 | 1543 | 2026-05-06 |
| BLXBench | Coding | 10 | 71.10 | 2026-05-06 |
| Claw-Eval | Coding | 1 | 70.4% | 2026-05-28 |
| DeepSWE | Coding | 6 | 27.06 | 2026-05-26 |
| FrontierSWE | Coding | 3 | 4.9 avg rank | 2026-05-28 |
| Kernel Bench L3 | Coding | 1 | 2.63/98% | 2026-05-28 |
| LiveCodeBench | Coding | 4 | 88.8% | 2026-05-28 |
| LiveCodeBench | Coding | 21 | 84.676% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 3 | 1548.84 | 2026-05-06 |
| LMArena WebDev Arena | Coding | 4 | 1544.36 | 2026-05-06 |
| NL2Repo | Coding | 1 | 47.6% | 2026-05-28 |
| QwenSVG | Coding | 3 | 1541 | 2026-05-28 |
| QwenWebDev | Coding | 1 | 1617 | 2026-05-28 |
| SciCode | Coding | 3 | 51.9% | 2026-05-28 |
| SciCode | Coding | 12 | 51.9% | 2026-05-11 |
| SciCode | Coding | 38 | 45.7% | 2026-05-11 |
| SWE-bench Verified | Coding | 6 | 78.2% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 11 | 58.427% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 4 | 65.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 5 | 65.4% | 2026-04-16 |
| Vibe Code Bench v1.1 | Coding | 6 | 57.573% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 8 | 53.498% | 2026-05-28 |
| RP-Bench | Creative | 1 | 1705.70 | 2026-05-06 |
| CyberGym | Cybersecurity | 3 | 0.74 | 2026-05-06 |
| CyberGym | Cybersecurity | 2 | 73.8% | 2026-04-16 |
| OrgForge-IT | Cybersecurity | 1 | 1.000 | 2026-05-28 |
| SecCodeBench | Cybersecurity | 2 | 64.9% | 2026-05-28 |
| Arena AI Document | Document AI | 1 | 1526 | 2026-05-06 |
| Arena AI Document | Document AI | 2 | 1520 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 5 | 73.30 | 2026-05-06 |
| SAGE | Education | 6 | 51.575% | 2026-05-28 |
| TutorBench | Education | 2 | 53.68 | 2026-05-06 |
| TutorBench | Education | 2 | 53.55 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 78 | 87.80 | 2026-05-06 |
| CorpFin v2 | Finance | 5 | 67.016% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 5 | 60.046% | 2026-05-04 |
| Finance Agent v1.1 | Finance | 3 | 60.1% | 2026-04-16 |
| MortgageTax | Finance | 10 | 68.522% | 2026-05-28 |
| PRBench Finance | Finance | 1 | 53.28 | 2026-05-06 |
| TaxBench | Finance | 4 | 21.37% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 3 | 75.961% | 2026-05-28 |
| React Native Evals | Frontend Development | 6 | 84.1026% overall | 2026-05-28 |
| MageBench Season 1 | Game | 1 | 1747 rating / 16 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 1 | 64.87 | 2026-05-06 |
| BenchLM | General Knowledge | 10 | 87 | 2026-05-06 |
| MAXIFE | General Knowledge | 6 | 81.3% | 2026-05-28 |
| MMLU-ProX | General Knowledge | 2 | 86.1% | 2026-05-28 |
| MMLU-Redux | General Knowledge | 2 | 95.2% | 2026-05-28 |
| NOVA-63 | General Knowledge | 1 | 59.1% | 2026-05-28 |
| LMArena Text Arena | Generalization | 1 | 1500.24 | 2026-05-06 |
| LMArena Text Arena | Generalization | 2 | 1496.47 | 2026-05-06 |
| MedCode | Healthcare | 13 | 49.129% | 2026-05-28 |
| MedCode | Healthcare | 15 | 48.244% | 2026-05-28 |
| MedQA | Healthcare | 12 | 95.408% | 2026-04-16 |
| MedScribe | Healthcare | 3 | 86.738% | 2026-05-28 |
| MedScribe | Healthcare | 4 | 86.13% | 2026-05-28 |
| PhysicianBench | Healthcare | 2 | 31.7 +/- 2.3 | 2026-05-27 |
| PlaceboBench | Healthcare | 7 | 36.2319 | 2026-05-27 |
| HUMAINE | Human Preference | 28 | 3.48 | 2026-05-06 |
| IFBench | Instruction Following | 6 | 62.5% | 2026-05-28 |
| IFEval | Instruction Following | 5 | 91.9% | 2026-05-28 |
| AIIQ Composite IQ | Intelligence | 5 | 131 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 11 | 52.95 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 38 | 46.46 | 2026-05-11 |
| GPQA Diamond | Intelligence | 11 | 89.646% | 2026-05-28 |
| HLE w/ tools | Intelligence | 3 | 53% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 2 | 40% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 10 | 36.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 77 | 18.6% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 4 | 53.3% | 2026-04-16 |
| MathVision | Intelligence | 13 | 84.60 | 2026-05-06 |
| MathVision | Intelligence | 26 | 71.20 | 2026-05-06 |
| MMLU Pro | Intelligence | 7 | 89.107% | 2026-05-28 |
| MMLU-Pro | Intelligence | 1 | 89.7% | 2026-05-28 |
| MMMU Pro | Intelligence | 14 | 83.873% | 2026-05-28 |
| OCRBench v2 | Intelligence | 16 | 48.40 | 2026-05-06 |
| OCRBench v2 | Intelligence | 9 | 59.80 | 2026-05-06 |
| SuperGPQA | Intelligence | 2 | 72.5% | 2026-05-28 |
| CaseLaw v2 | Legal | 20 | 62.058% | 2026-05-04 |
| Harvey Legal Agent Benchmark | Legal | 3 | 4.2% | 2026-05-28 |
| LegalBench | Legal | 8 | 85.301% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 1 | 52.27 | 2026-05-06 |
| Graphwalks BFS >128k | Long Context | 2 | 0.61 | 2026-05-06 |
| Graphwalks BFS 1M F1 | Long Context | 4 | 16.3% | 2026-05-28 |
| Graphwalks BFS 1M F1 | Long Context | 2 | 41.2% | 2026-04-23 |
| Graphwalks BFS 256k F1 | Long Context | 4 | 61.1% | 2026-05-28 |
| Graphwalks parents >128k | Long Context | 1 | 0.95 | 2026-05-06 |
| Graphwalks Parents 1M F1 | Long Context | 4 | 48.6% | 2026-05-28 |
| Graphwalks Parents 1M F1 | Long Context | 1 | 72% | 2026-04-23 |
| Graphwalks Parents 256k F1 | Long Context | 2 | 95.4% | 2026-05-28 |
| MRCR v2 (8-needle) | Long Context | 1 | 0.93 | 2026-05-06 |
| MRCR-v2 128k | Long Context | 3 | 84% | 2026-05-28 |
| AIME | Math | 8 | 95.625% | 2026-04-16 |
| ProofBench | Math | 5 | 50% | 2026-05-28 |
| HMMT February 2026 | Mathematics | 2 | 96.2% | 2026-05-28 |
| IMO-AnswerBench | Mathematics | 6 | 75.3% | 2026-05-28 |
| MathArena Apex | Mathematics | 3 | 34.5% | 2026-05-28 |
| Medical Chronology LLM Benchmark | Medical | 1 | 0.92 | 2026-05-06 |
| INCLUDE | Multilingual | 1 | 87.4% | 2026-05-28 |
| MMMLU | Multilingual | 1 | 90.6% | 2026-05-28 |
| MMMLU | Multilingual | 3 | 91.1% | 2026-04-16 |
| ALL Bench Multimodal | Multimodal | 2 | 63.16 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 7 | 8.51 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 1 | 52.61 | 2026-05-06 |
| CharXiv-R | Multimodal | 16 | 0.77 | 2026-05-06 |
| CharXiv-R | Multimodal | 3 | 84.7% | 2026-04-16 |
| Design Arena | Multimodal | 2 | 1345 | 2026-05-06 |
| Design Arena | Multimodal | 3 | 1343 | 2026-05-06 |
| FigQA | Multimodal | 2 | 0.78 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 9 | 80.37 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 1 | 1317.74 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 4 | 1311.60 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 20 | 46.07 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 21 | 45.48 | 2026-05-06 |
| VTB | Multimodal | 2 | 27.52 | 2026-05-06 |
| ARC-AGI v2 | Reasoning | 4 | 0.69 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 6 | 44.0 | 2026-05-27 |
| Context Arena | Reasoning | 5 | 73.06 | 2026-05-06 |
| Context Arena | Reasoning | 6 | 72.43 | 2026-05-06 |
| Context Arena | Reasoning | 7 | 72.26 | 2026-05-06 |
| Context Arena | Reasoning | 26 | 48.19 | 2026-05-06 |
| EnigmaEval | Reasoning | 10 | 7.60 | 2026-05-06 |
| EnigmaEval | Reasoning | 12 | 6.84 | 2026-05-06 |
| FINAL Bench Metacognitive | Reasoning | 5 | 76.17 | 2026-05-06 |
| Global PIQA | Reasoning | 2 | 91.2% | 2026-05-28 |
| GPQA Diamond | Reasoning | 2 | 91.3% | 2026-05-28 |
| GPQA Diamond | Reasoning | 17 | 89.6% | 2026-05-11 |
| GPQA Diamond | Reasoning | 66 | 84% | 2026-05-11 |
| GPQA Diamond | Reasoning | 5 | 91.3% | 2026-04-16 |
| Humanity's Last Exam (Text Only) | Reasoning | 4 | 36.24 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 12 | 19.37 | 2026-05-06 |
| MultiNRC | Reasoning | 3 | 57.06 | 2026-05-06 |
| MultiNRC | Reasoning | 7 | 48.34 | 2026-05-06 |
| CAIS Risk Index | Safety | 7 | 40.7 | 2026-05-27 |
| BioMysteryBench Human-Difficult | Science | 3 | 23.5% | 2026-04-29 |
| BioMysteryBench Human-Solvable | Science | 3 | 77.4% | 2026-04-29 |
| CritPt | Science | 2 | 12.6% | 2026-05-28 |
| CritPt | Science | 11 | 12.6% | 2026-05-11 |
| CritPt | Science | 53 | 2.8% | 2026-05-11 |
| DeepSearchQA | Search | 4 | 88.7% | 2026-05-28 |
| DeepSearchQA | Search | 1 | 0.91 | 2026-05-06 |
| ProgramBench | Software Engineering | 2 | 0% | 2026-05-05 |
| SWE-bench Multilingual | Software Engineering | 2 | 77.5% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 5 | 57.3% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 5 | 53.4% | 2026-04-16 |
| SWE-bench Verified | Software Engineering | 1 | 80.8% | 2026-05-28 |
| SWE-bench Verified | Software Engineering | 3 | 80.8% | 2026-04-16 |
| SpreadsheetBench | Spreadsheets | 1 | 89.3% | 2026-05-28 |
| Structured Output Benchmark | Structured Output | 12 | 85.30 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 2 | 39.43 | 2026-05-06 |
| BFCL-V4 | Tool Use | 1 | 76.7% | 2026-05-28 |
| WMT24++ | Translation | 3 | 82.7% | 2026-05-28 |
| CAIS Vision Capabilities Index | Vision | 18 | 48.0 | 2026-05-27 |
No matching rows.