GPT-5.5

GPT / OpenAI

198scores
113benchmarks
$5 / $30 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5.5, gpt-5.5-20260423, gpt-5.5-high, gpt-5.5-medium, gpt-5.5-xhigh, gpt-5-5-2026-04-22-thinking-high, gpt-5-5-2026-04-22-thinking-low, gpt-5-5-2026-04-22-thinking-medium, gpt-5-5-2026-04-22-thinking-xhigh, openai-gpt-5-5-2026-04-23-high, openai-gpt-5.5, openai-gpt-5.5-20260423, openai_gpt_5_5_2026_04_23_reasoning_effort_high, openai_gpt_5_5_2026_04_23_reasoning_effort_low, openai_gpt_5_5_2026_04_23_reasoning_effort_medium, openai_gpt_5_5_2026_04_23_reasoning_effort_none, openai_gpt_5_5_2026_04_23_reasoning_effort_xhigh, openai/gpt-5.5, openai/gpt-5.5-20260423

Official Sources

1 linked source

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 1 53.90 2026-05-06
APEX-Agents-AA Agentic 1 37.7% 2026-05-11
ARC-AGI-1 Agentic 6 95 2026-05-05
ARC-AGI-1 Agentic 10 94.50 2026-05-05
ARC-AGI-1 Agentic 17 92.17 2026-05-05
ARC-AGI-1 Agentic 36 76.17 2026-05-05
ARC-AGI-1 Agentic 2 95% 2026-04-23
ARC-AGI-2 Agentic 2 85 2026-05-05
ARC-AGI-2 Agentic 7 83.33 2026-05-05
ARC-AGI-2 Agentic 13 70.42 2026-05-05
ARC-AGI-2 Agentic 34 33.33 2026-05-05
ARC-AGI-2 Agentic 1 85% 2026-04-23
ARC-AGI-3 Agentic 2 0.43 2026-05-05
AutomationBench Agentic 2 12.9% 2026-05-28
AutomationBench Agentic 2 12.90 2026-05-21
AutomationBench Agentic 5 11.30 2026-05-21
AutomationBench Agentic 8 8.50 2026-05-21
BrowseComp Agentic 2 84.4% 2026-05-28
BrowseComp Agentic 4 84.4% 2026-04-23
GDPval-AA Agentic 2 1769 Elo 2026-05-28
Gert Labs Rankings Agentic 1 0.77 2026-05-11
HiL-Bench Agentic 1 29.1% 2026-05-05
ITBench-AA Agentic 2 45.8% 2026-05-28
LMArena Search Arena Agentic 2 1234.91 2026-05-06
MCP Atlas Agentic 4 75.3% 2026-05-28
MCP Atlas Agentic 2 75.30 2026-05-06
MCP Atlas Agentic 3 75.3% 2026-04-23
OSWorld-Verified Agentic 3 78.7% 2026-05-28
OSWorld-Verified Agentic 2 0.79 2026-05-06
OSWorld-Verified Agentic 1 78.7% 2026-04-23
RuneBench Agentic 1 5.30 2026-05-05
Tau2-Bench Telecom Agentic 31 93.9% 2026-05-11
Tau2-Bench Telecom Agentic 39 93% 2026-05-11
Tau2-Bench Telecom Agentic 49 91.8% 2026-05-11
Tau2-Bench Telecom Agentic 94 83.9% 2026-05-11
Tau2-Bench Telecom Agentic 139 69.3% 2026-05-11
Tau2-Bench Telecom Agentic 1 98% 2026-04-23
Terminal-Bench Hard Agentic 1 60.6% 2026-05-11
Terminal-Bench Hard Agentic 2 59.8% 2026-05-11
Terminal-Bench Hard Agentic 4 57.6% 2026-05-11
Terminal-Bench Hard Agentic 10 52.3% 2026-05-11
Terminal-Bench Hard Agentic 12 49.2% 2026-05-11
TERMS-Bench Agentic 7 60.6% SE+ 2026-05-28
Toolathlon Agentic 1 0.56 2026-05-06
Toolathlon Agentic 1 55.6% 2026-04-23
Vending-Bench 2 Agentic 3 7523.84 2026-05-28
OpenUGI Alignment 88 51.19 2026-05-06
OpenUGI Alignment 93 50.98 2026-05-06
OpenUGI Alignment 111 49.97 2026-05-06
OpenUGI Alignment 126 49.19 2026-05-06
OpenUGI Alignment 220 44.98 2026-05-06
scBench Biology 1 57.95% 2026-05-27
scBench Biology 2 57.78% 2026-05-27
SpatialBench Biology 1 57.65% 2026-05-27
SpatialBench Biology 3 53.67% 2026-05-27
ALE-Bench Coding 1 1942.97 2026-05-06
ALE-Bench Coding 4 1589.38 2026-05-06
ALE-Bench Coding 21 1127.58 2026-05-06
Arena AI Code Coding 10 1490 2026-05-06
Arena AI Code Coding 18 1443 2026-05-06
BLXBench Coding 11 65.90 2026-05-06
DeepSWE Coding 1 70.05 2026-05-26
Expert-SWE (Internal) Coding 1 73.1% 2026-04-23
KernelBench Hard Coding 1 100 2026-05-06
LiveCodeBench Coding 18 85.296% 2026-05-28
LMArena WebDev Arena Coding 10 1490.28 2026-05-06
LMArena WebDev Arena Coding 18 1441.00 2026-05-06
SciCode Coding 4 56.1% 2026-05-11
SciCode Coding 5 55.9% 2026-05-11
SciCode Coding 8 53.5% 2026-05-11
SciCode Coding 13 51.6% 2026-05-11
SciCode Coding 25 47.3% 2026-05-11
SWE Atlas - Refactoring Coding 1 44.79 2026-05-06
SWE-bench Verified Coding 2 82.6% 2026-05-28
Terminal-Bench 2.0 Coding 1 73.202% 2026-05-28
Terminal-Bench 2.0 Coding 1 82.7% 2026-04-23
Terminal-Bench 2.1 Coding 1 76.404% 2026-05-28
Terminal-Bench 2.1 Coding 1 78.2% 2026-05-28
Vibe Code Bench v1.1 Coding 3 69.847% 2026-05-28
Capture-the-Flags Challenge Tasks (Internal) Cybersecurity 1 88.1% 2026-04-23
CyberGym Cybersecurity 2 0.82 2026-05-06
CyberGym Cybersecurity 1 81.8% 2026-04-23
ExploitBench v8-bench Cybersecurity 3 5.51 points 2026-05-15
ExploitBench v8-bench Cybersecurity 4 4.44 points 2026-05-15
ExploitBench v8-bench Cybersecurity 5 4.3 points 2026-05-15
ExploitBench v8-bench Cybersecurity 6 3.76 points 2026-05-15
DAXBench Data 16 86.7% 2026-05-28
Arena AI Document Document AI 6 1490 2026-05-06
Arena AI Document Document AI 7 1487 2026-05-06
OfficeQA Pro Document AI 1 54.1% 2026-04-23
SAGE Education 7 51.532% 2026-05-28
AA-Omniscience Factuality 3 20.07 2026-05-11
CorpFin v2 Finance 2 68.415% 2026-05-28
Finance Agent v1.1 Finance 6 59.963% 2026-05-04
Finance Agent v1.1 Finance 3 60% 2026-04-23
Finance Agent v2 Finance 3 51.76% 2026-05-28
Finance Agent v2 Finance 2 51.8% 2026-05-28
Investment Banking Modeling Tasks (Internal) Finance 2 88.5% 2026-04-23
MortgageTax Finance 6 68.76% 2026-05-28
Rogo Big Finance Bench Finance 2 59% rubric / 44% final 2026-05-28
TaxBench Finance 3 24.43% mean pass^5 2026-05-27
TaxEval v2 Finance 12 74.98% 2026-05-28
React Native Evals Frontend Development 5 84.652% overall 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 2 1620.63 Elo / 19 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 9 1292.49 Elo / 107 games 2026-05-28
InfiniteBM Liar's Dice Game 16 1235.22 Elo / 40 games 2026-05-28
InfiniteBM Liar's Dice Game 18 1220.47 Elo / 114 games 2026-05-28
BenchLM General Knowledge 3 91 2026-05-06
GDPval Generalization 1 84.9% 2026-04-23
LMArena Text Arena Generalization 8 1472.79 2026-05-06
LMArena Text Arena Generalization 14 1461.23 2026-05-06
MedCode Healthcare 14 49.1% 2026-05-28
MedScribe Healthcare 2 86.868% 2026-05-28
PhysicianBench Healthcare 1 46.3 +/- 1.2 2026-05-27
HUMAINE Human Preference 8 3.70 2026-05-06
AIIQ Composite IQ Intelligence 1 136 2026-05-12
Artificial Analysis Intelligence Index Intelligence 1 60.24 2026-05-11
Artificial Analysis Intelligence Index Intelligence 2 58.87 2026-05-11
Artificial Analysis Intelligence Index Intelligence 6 56.71 2026-05-11
Artificial Analysis Intelligence Index Intelligence 19 50.78 2026-05-11
Artificial Analysis Intelligence Index Intelligence 70 40.94 2026-05-11
GPQA Diamond Intelligence 2 93.182% 2026-05-28
Humanity's Last Exam Intelligence 3 52.2% 2026-05-28
Humanity's Last Exam Intelligence 2 44.3% 2026-05-11
Humanity's Last Exam Intelligence 3 43% 2026-05-11
Humanity's Last Exam Intelligence 5 40.6% 2026-05-11
Humanity's Last Exam Intelligence 23 31% 2026-05-11
Humanity's Last Exam Intelligence 118 12.6% 2026-05-11
Humanity's Last Exam Intelligence 4 52.2% 2026-04-23
LiveBench Intelligence 1 81.28 2026-05-05
LiveBench Intelligence 5 77.07 2026-05-05
LiveBench Intelligence 35 68.96 2026-05-05
MMLU Pro Intelligence 9 88.144% 2026-05-28
MMMU Pro Intelligence 2 88.266% 2026-05-28
Vals Index Intelligence 2 67.622% 2026-05-28
Vals Multimodal Index Intelligence 2 67.768% 2026-05-28
CaseLaw v2 Legal 7 66.238% 2026-05-04
Harvey Legal Agent Benchmark Legal 4 2.1% 2026-05-28
LegalBench Legal 4 86.515% 2026-05-28
Realm Warren Legal 2 0.35 2026-05-07
Graphwalks BFS >128k Long Context 3 0.45 2026-05-06
Graphwalks BFS 1M F1 Long Context 2 45.4% 2026-05-28
Graphwalks BFS 1M F1 Long Context 1 45.4% 2026-04-23
Graphwalks BFS 256k F1 Long Context 3 73.7% 2026-05-28
Graphwalks BFS 256k F1 Long Context 2 73.7% 2026-04-23
Graphwalks parents >128k Long Context 2 0.58 2026-05-06
Graphwalks Parents 1M F1 Long Context 2 58.5% 2026-05-28
Graphwalks Parents 1M F1 Long Context 2 58.5% 2026-04-23
Graphwalks Parents 256k F1 Long Context 4 90.1% 2026-05-28
Graphwalks Parents 256k F1 Long Context 2 90.1% 2026-04-23
MRCR v2 (8-needle) Long Context 2 0.74 2026-05-06
OpenAI MRCR v2 8-needle 128K-256K Long Context 1 87.5% 2026-04-23
OpenAI MRCR v2 8-needle 16K-32K Long Context 2 96.5% 2026-04-23
OpenAI MRCR v2 8-needle 256K-512K Long Context 1 81.5% 2026-04-23
OpenAI MRCR v2 8-needle 32K-64K Long Context 2 90% 2026-04-23
OpenAI MRCR v2 8-needle 4K-8K Long Context 1 98.1% 2026-04-23
OpenAI MRCR v2 8-needle 512K-1M Long Context 1 74% 2026-04-23
OpenAI MRCR v2 8-needle 64K-128K Long Context 2 83.1% 2026-04-23
OpenAI MRCR v2 8-needle 8K-16K Long Context 1 93% 2026-04-23
FrontierMath Math 2 35.4 2026-05-27
ProofBench Math 6 50% 2026-05-28
ArxivMath Mathematics 2 71.5% 2026-05-28
FrontierMath 2025-02-28 Private Mathematics 2 51.7% 2026-04-23
FrontierMath Tier 4 2025-07-01 Private Mathematics 3 35.4% 2026-04-23
Blueprint-Bench 2 Multimodal 2 0.706 +/- 0.008 2026-05-28
Design Arena Multimodal 9 1315 2026-05-06
GDPval-MM Multimodal 1 0.85 2026-05-06
LMArena Vision Arena Multimodal 7 1297.64 2026-05-06
LMArena Vision Arena Multimodal 10 1279.71 2026-05-06
MMMU-Pro Multimodal 1 83.2% 2026-04-23
ARC-AGI v2 Reasoning 1 0.85 2026-05-06
CAIS Text Capabilities Index Reasoning 1 54.1 2026-05-27
Context Arena Reasoning 1 79.77 2026-05-06
Context Arena Reasoning 2 78.96 2026-05-06
Context Arena Reasoning 3 78.59 2026-05-06
Context Arena Reasoning 4 75.03 2026-05-06
Context Arena Reasoning 39 34.90 2026-05-06
GPQA Diamond Reasoning 2 93.5% 2026-05-11
GPQA Diamond Reasoning 3 93.2% 2026-05-11
GPQA Diamond Reasoning 4 92.6% 2026-05-11
GPQA Diamond Reasoning 10 91% 2026-05-11
GPQA Diamond Reasoning 137 76.8% 2026-05-11
GPQA Diamond Reasoning 4 93.6% 2026-04-23
CAIS Risk Index Safety 8 42.4 2026-05-27
BixBench Science 1 80.5% 2026-04-23
CritPt Science 3 27.1% 2026-05-11
CritPt Science 5 25.4% 2026-05-11
CritPt Science 7 18.6% 2026-05-11
CritPt Science 21 8% 2026-05-11
CritPt Science 73 1.4% 2026-05-11
GeneBench Science 2 0.25 2026-05-06
GeneBench Science 3 25% 2026-04-23
SWE-bench Pro Software Engineering 3 58.6% 2026-05-28
SWE-bench Pro Software Engineering 2 58.6% 2026-04-23
Structured Output Benchmark Structured Output 7 86 2026-05-06
LiveSQLBench Text to SQL 3 37.36 2026-05-06
LiveSQLBench Text to SQL 4 37.24 2026-05-06
CAIS Vision Capabilities Index Vision 5 60.5 2026-05-27