GPT-5.5
GPT / OpenAI
198scores
113benchmarks
$5 / $30 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5.5, gpt-5.5-20260423, gpt-5.5-high, gpt-5.5-medium, gpt-5.5-xhigh, gpt-5-5-2026-04-22-thinking-high, gpt-5-5-2026-04-22-thinking-low, gpt-5-5-2026-04-22-thinking-medium, gpt-5-5-2026-04-22-thinking-xhigh, openai-gpt-5-5-2026-04-23-high, openai-gpt-5.5, openai-gpt-5.5-20260423, openai_gpt_5_5_2026_04_23_reasoning_effort_high, openai_gpt_5_5_2026_04_23_reasoning_effort_low, openai_gpt_5_5_2026_04_23_reasoning_effort_medium, openai_gpt_5_5_2026_04_23_reasoning_effort_none, openai_gpt_5_5_2026_04_23_reasoning_effort_xhigh, openai/gpt-5.5, openai/gpt-5.5-20260423
Official Sources
1 linked source| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 1 | 53.90 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 1 | 37.7% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 6 | 95 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 10 | 94.50 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 17 | 92.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 36 | 76.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 2 | 95% | 2026-04-23 |
| ARC-AGI-2 | Agentic | 2 | 85 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 7 | 83.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 13 | 70.42 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 34 | 33.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 1 | 85% | 2026-04-23 |
| ARC-AGI-3 | Agentic | 2 | 0.43 | 2026-05-05 |
| AutomationBench | Agentic | 2 | 12.9% | 2026-05-28 |
| AutomationBench | Agentic | 2 | 12.90 | 2026-05-21 |
| AutomationBench | Agentic | 5 | 11.30 | 2026-05-21 |
| AutomationBench | Agentic | 8 | 8.50 | 2026-05-21 |
| BrowseComp | Agentic | 2 | 84.4% | 2026-05-28 |
| BrowseComp | Agentic | 4 | 84.4% | 2026-04-23 |
| GDPval-AA | Agentic | 2 | 1769 Elo | 2026-05-28 |
| Gert Labs Rankings | Agentic | 1 | 0.77 | 2026-05-11 |
| HiL-Bench | Agentic | 1 | 29.1% | 2026-05-05 |
| ITBench-AA | Agentic | 2 | 45.8% | 2026-05-28 |
| LMArena Search Arena | Agentic | 2 | 1234.91 | 2026-05-06 |
| MCP Atlas | Agentic | 4 | 75.3% | 2026-05-28 |
| MCP Atlas | Agentic | 2 | 75.30 | 2026-05-06 |
| MCP Atlas | Agentic | 3 | 75.3% | 2026-04-23 |
| OSWorld-Verified | Agentic | 3 | 78.7% | 2026-05-28 |
| OSWorld-Verified | Agentic | 2 | 0.79 | 2026-05-06 |
| OSWorld-Verified | Agentic | 1 | 78.7% | 2026-04-23 |
| RuneBench | Agentic | 1 | 5.30 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 31 | 93.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 39 | 93% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 49 | 91.8% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 94 | 83.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 139 | 69.3% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 1 | 98% | 2026-04-23 |
| Terminal-Bench Hard | Agentic | 1 | 60.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 2 | 59.8% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 4 | 57.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 10 | 52.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 12 | 49.2% | 2026-05-11 |
| TERMS-Bench | Agentic | 7 | 60.6% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 1 | 0.56 | 2026-05-06 |
| Toolathlon | Agentic | 1 | 55.6% | 2026-04-23 |
| Vending-Bench 2 | Agentic | 3 | 7523.84 | 2026-05-28 |
| OpenUGI | Alignment | 88 | 51.19 | 2026-05-06 |
| OpenUGI | Alignment | 93 | 50.98 | 2026-05-06 |
| OpenUGI | Alignment | 111 | 49.97 | 2026-05-06 |
| OpenUGI | Alignment | 126 | 49.19 | 2026-05-06 |
| OpenUGI | Alignment | 220 | 44.98 | 2026-05-06 |
| scBench | Biology | 1 | 57.95% | 2026-05-27 |
| scBench | Biology | 2 | 57.78% | 2026-05-27 |
| SpatialBench | Biology | 1 | 57.65% | 2026-05-27 |
| SpatialBench | Biology | 3 | 53.67% | 2026-05-27 |
| ALE-Bench | Coding | 1 | 1942.97 | 2026-05-06 |
| ALE-Bench | Coding | 4 | 1589.38 | 2026-05-06 |
| ALE-Bench | Coding | 21 | 1127.58 | 2026-05-06 |
| Arena AI Code | Coding | 10 | 1490 | 2026-05-06 |
| Arena AI Code | Coding | 18 | 1443 | 2026-05-06 |
| BLXBench | Coding | 11 | 65.90 | 2026-05-06 |
| DeepSWE | Coding | 1 | 70.05 | 2026-05-26 |
| Expert-SWE (Internal) | Coding | 1 | 73.1% | 2026-04-23 |
| KernelBench Hard | Coding | 1 | 100 | 2026-05-06 |
| LiveCodeBench | Coding | 18 | 85.296% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 10 | 1490.28 | 2026-05-06 |
| LMArena WebDev Arena | Coding | 18 | 1441.00 | 2026-05-06 |
| SciCode | Coding | 4 | 56.1% | 2026-05-11 |
| SciCode | Coding | 5 | 55.9% | 2026-05-11 |
| SciCode | Coding | 8 | 53.5% | 2026-05-11 |
| SciCode | Coding | 13 | 51.6% | 2026-05-11 |
| SciCode | Coding | 25 | 47.3% | 2026-05-11 |
| SWE Atlas - Refactoring | Coding | 1 | 44.79 | 2026-05-06 |
| SWE-bench Verified | Coding | 2 | 82.6% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 1 | 73.202% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 1 | 82.7% | 2026-04-23 |
| Terminal-Bench 2.1 | Coding | 1 | 76.404% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 1 | 78.2% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 3 | 69.847% | 2026-05-28 |
| Capture-the-Flags Challenge Tasks (Internal) | Cybersecurity | 1 | 88.1% | 2026-04-23 |
| CyberGym | Cybersecurity | 2 | 0.82 | 2026-05-06 |
| CyberGym | Cybersecurity | 1 | 81.8% | 2026-04-23 |
| ExploitBench v8-bench | Cybersecurity | 3 | 5.51 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 4 | 4.44 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 5 | 4.3 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 6 | 3.76 points | 2026-05-15 |
| DAXBench | Data | 16 | 86.7% | 2026-05-28 |
| Arena AI Document | Document AI | 6 | 1490 | 2026-05-06 |
| Arena AI Document | Document AI | 7 | 1487 | 2026-05-06 |
| OfficeQA Pro | Document AI | 1 | 54.1% | 2026-04-23 |
| SAGE | Education | 7 | 51.532% | 2026-05-28 |
| AA-Omniscience | Factuality | 3 | 20.07 | 2026-05-11 |
| CorpFin v2 | Finance | 2 | 68.415% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 6 | 59.963% | 2026-05-04 |
| Finance Agent v1.1 | Finance | 3 | 60% | 2026-04-23 |
| Finance Agent v2 | Finance | 3 | 51.76% | 2026-05-28 |
| Finance Agent v2 | Finance | 2 | 51.8% | 2026-05-28 |
| Investment Banking Modeling Tasks (Internal) | Finance | 2 | 88.5% | 2026-04-23 |
| MortgageTax | Finance | 6 | 68.76% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 2 | 59% rubric / 44% final | 2026-05-28 |
| TaxBench | Finance | 3 | 24.43% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 12 | 74.98% | 2026-05-28 |
| React Native Evals | Frontend Development | 5 | 84.652% overall | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 2 | 1620.63 Elo / 19 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 9 | 1292.49 Elo / 107 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 16 | 1235.22 Elo / 40 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 18 | 1220.47 Elo / 114 games | 2026-05-28 |
| BenchLM | General Knowledge | 3 | 91 | 2026-05-06 |
| GDPval | Generalization | 1 | 84.9% | 2026-04-23 |
| LMArena Text Arena | Generalization | 8 | 1472.79 | 2026-05-06 |
| LMArena Text Arena | Generalization | 14 | 1461.23 | 2026-05-06 |
| MedCode | Healthcare | 14 | 49.1% | 2026-05-28 |
| MedScribe | Healthcare | 2 | 86.868% | 2026-05-28 |
| PhysicianBench | Healthcare | 1 | 46.3 +/- 1.2 | 2026-05-27 |
| HUMAINE | Human Preference | 8 | 3.70 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 1 | 136 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 1 | 60.24 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 2 | 58.87 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 6 | 56.71 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 19 | 50.78 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 70 | 40.94 | 2026-05-11 |
| GPQA Diamond | Intelligence | 2 | 93.182% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 3 | 52.2% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 2 | 44.3% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 3 | 43% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 5 | 40.6% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 23 | 31% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 118 | 12.6% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 4 | 52.2% | 2026-04-23 |
| LiveBench | Intelligence | 1 | 81.28 | 2026-05-05 |
| LiveBench | Intelligence | 5 | 77.07 | 2026-05-05 |
| LiveBench | Intelligence | 35 | 68.96 | 2026-05-05 |
| MMLU Pro | Intelligence | 9 | 88.144% | 2026-05-28 |
| MMMU Pro | Intelligence | 2 | 88.266% | 2026-05-28 |
| Vals Index | Intelligence | 2 | 67.622% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 2 | 67.768% | 2026-05-28 |
| CaseLaw v2 | Legal | 7 | 66.238% | 2026-05-04 |
| Harvey Legal Agent Benchmark | Legal | 4 | 2.1% | 2026-05-28 |
| LegalBench | Legal | 4 | 86.515% | 2026-05-28 |
| Realm Warren | Legal | 2 | 0.35 | 2026-05-07 |
| Graphwalks BFS >128k | Long Context | 3 | 0.45 | 2026-05-06 |
| Graphwalks BFS 1M F1 | Long Context | 2 | 45.4% | 2026-05-28 |
| Graphwalks BFS 1M F1 | Long Context | 1 | 45.4% | 2026-04-23 |
| Graphwalks BFS 256k F1 | Long Context | 3 | 73.7% | 2026-05-28 |
| Graphwalks BFS 256k F1 | Long Context | 2 | 73.7% | 2026-04-23 |
| Graphwalks parents >128k | Long Context | 2 | 0.58 | 2026-05-06 |
| Graphwalks Parents 1M F1 | Long Context | 2 | 58.5% | 2026-05-28 |
| Graphwalks Parents 1M F1 | Long Context | 2 | 58.5% | 2026-04-23 |
| Graphwalks Parents 256k F1 | Long Context | 4 | 90.1% | 2026-05-28 |
| Graphwalks Parents 256k F1 | Long Context | 2 | 90.1% | 2026-04-23 |
| MRCR v2 (8-needle) | Long Context | 2 | 0.74 | 2026-05-06 |
| OpenAI MRCR v2 8-needle 128K-256K | Long Context | 1 | 87.5% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 16K-32K | Long Context | 2 | 96.5% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 256K-512K | Long Context | 1 | 81.5% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 32K-64K | Long Context | 2 | 90% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 4K-8K | Long Context | 1 | 98.1% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 512K-1M | Long Context | 1 | 74% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 64K-128K | Long Context | 2 | 83.1% | 2026-04-23 |
| OpenAI MRCR v2 8-needle 8K-16K | Long Context | 1 | 93% | 2026-04-23 |
| FrontierMath | Math | 2 | 35.4 | 2026-05-27 |
| ProofBench | Math | 6 | 50% | 2026-05-28 |
| ArxivMath | Mathematics | 2 | 71.5% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 2 | 51.7% | 2026-04-23 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 3 | 35.4% | 2026-04-23 |
| Blueprint-Bench 2 | Multimodal | 2 | 0.706 +/- 0.008 | 2026-05-28 |
| Design Arena | Multimodal | 9 | 1315 | 2026-05-06 |
| GDPval-MM | Multimodal | 1 | 0.85 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 7 | 1297.64 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 10 | 1279.71 | 2026-05-06 |
| MMMU-Pro | Multimodal | 1 | 83.2% | 2026-04-23 |
| ARC-AGI v2 | Reasoning | 1 | 0.85 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 1 | 54.1 | 2026-05-27 |
| Context Arena | Reasoning | 1 | 79.77 | 2026-05-06 |
| Context Arena | Reasoning | 2 | 78.96 | 2026-05-06 |
| Context Arena | Reasoning | 3 | 78.59 | 2026-05-06 |
| Context Arena | Reasoning | 4 | 75.03 | 2026-05-06 |
| Context Arena | Reasoning | 39 | 34.90 | 2026-05-06 |
| GPQA Diamond | Reasoning | 2 | 93.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 3 | 93.2% | 2026-05-11 |
| GPQA Diamond | Reasoning | 4 | 92.6% | 2026-05-11 |
| GPQA Diamond | Reasoning | 10 | 91% | 2026-05-11 |
| GPQA Diamond | Reasoning | 137 | 76.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 4 | 93.6% | 2026-04-23 |
| CAIS Risk Index | Safety | 8 | 42.4 | 2026-05-27 |
| BixBench | Science | 1 | 80.5% | 2026-04-23 |
| CritPt | Science | 3 | 27.1% | 2026-05-11 |
| CritPt | Science | 5 | 25.4% | 2026-05-11 |
| CritPt | Science | 7 | 18.6% | 2026-05-11 |
| CritPt | Science | 21 | 8% | 2026-05-11 |
| CritPt | Science | 73 | 1.4% | 2026-05-11 |
| GeneBench | Science | 2 | 0.25 | 2026-05-06 |
| GeneBench | Science | 3 | 25% | 2026-04-23 |
| SWE-bench Pro | Software Engineering | 3 | 58.6% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 2 | 58.6% | 2026-04-23 |
| Structured Output Benchmark | Structured Output | 7 | 86 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 3 | 37.36 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 4 | 37.24 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 5 | 60.5 | 2026-05-27 |
No matching rows.