GPT-5

GPT / OpenAI

173scores
115benchmarks
$1.25 / $10 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5, gpt-5-2025-08-07, openai-gpt-5, openai-gpt-5-2025-08-07, openai/gpt-5, openai/gpt-5-2025-08-07

Benchmark Results

Benchmark Category Rank Score Sampled
ALFWorld Agentic 4 0.933 2026-05-27
APEX-Agents Agentic 18 33 2026-05-06
ARC-AGI-1 Agentic 45 65.67 2026-05-05
ARC-AGI-1 Agentic 59 56.17 2026-05-05
ARC-AGI-1 Agentic 72 44 2026-05-05
ARC-AGI-1 Agentic 130 6 2026-05-05
ARC-AGI-2 Agentic 50 9.86 2026-05-05
ARC-AGI-2 Agentic 55 7.49 2026-05-05
ARC-AGI-2 Agentic 96 1.94 2026-05-05
ARC-AGI-2 Agentic 97 1.94 2026-05-05
ARC-AGI-2 Agentic 144 0 2026-05-05
CAR-bench Agentic 2 0.54 2026-05-06
EnterpriseOps-Gym Agentic 8 29.2% 2026-05-05
LLM-WikiRace Agentic 3 60 2026-05-06
LMArena Search Arena Agentic 24 1133.24 2026-05-06
MCP-Universe Agentic 1 44.16 2026-05-06
MCP-Universe Agentic 2 43.72 2026-05-06
MCPMark Agentic 3 0.53 2026-05-06
MCPMark Agentic 4 0.52 2026-05-06
MCPMark Agentic 6 0.47 2026-05-06
MobileWorld Agentic 1 51.7% 2026-05-27
MultiChallenge Agentic 4 63.19 2026-05-06
Poker Agent Agentic 2 1103.175% 2025-12-23
RealDataAgentBench Agentic 10 0.78 2026-04-28
Tau2 Airline Agentic 8 0.63 2026-05-06
Tau2-Bench Telecom Agentic 76 86.5% 2026-05-11
Tau2-Bench Telecom Agentic 87 84.8% 2026-05-11
Tau2-Bench Telecom Agentic 92 84.2% 2026-05-11
Tau2-Bench Telecom Agentic 147 67% 2026-05-11
Tau2-Bench Telecom Agentic 386 0% 2026-05-11
Terminal-Bench Hard Agentic 43 37.9% 2026-05-11
Terminal-Bench Hard Agentic 80 32.6% 2026-05-11
Terminal-Bench Hard Agentic 112 26.5% 2026-05-11
Terminal-Bench Hard Agentic 153 18.2% 2026-05-11
Terminal-Bench Hard Agentic 190 12.9% 2026-05-11
AgentBench FC Agents 11 52.20 2026-05-06
OpenUGI Alignment 309 42.03 2026-05-06
OpenUGI Alignment 598 34.28 2026-05-06
ABC-Bench Coding 3 49.4% +/- 1.9 2026-05-27
ALE-Bench Coding 18 1162.45 2026-05-06
ALE-Bench Coding 38 807.65 2026-05-06
Arena AI Code Coding 33 1393 2026-05-06
ArtifactsBench Coding 1 72.55 2026-05-06
ContextBench Coding 2 47.20 2026-05-06
IOI Coding 16 20% 2026-05-26
LiveCodeBench Coding 13 85.911% 2026-05-28
SciCode Coding 58 42.9% 2026-05-11
SciCode Coding 75 41.1% 2026-05-11
SciCode Coding 118 39.1% 2026-05-11
SciCode Coding 123 38.8% 2026-05-11
SciCode Coding 138 37.8% 2026-05-11
SWE-bench Verified Coding 35 69% 2026-05-28
Terminal-Bench 2.0 Coding 39 37.079% 2026-05-28
Vibe Code Bench v1.1 Coding 25 20.088% 2026-05-28
RedSage-Bench Cybersecurity 1 88.68% 2026-05-28
MMTU Data 1 0.70 2026-05-06
GSMA Open Telco Leaderboard Domain 6 71.88 2026-05-06
IslamicLegalBench Domain 1 67.65 2026-05-06
SAGE Education 22 43.68% 2026-05-28
TutorBench Education 1 55.33 2026-05-06
FActScore Factuality 2 0.01 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 88 85.30 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 89 84.90 2026-05-06
CorpFin v2 Finance 39 61.072% 2026-05-28
Fin-RATE Finance 1 43.37% 2026-05-28
Finance Agent v1.1 Finance 25 52.151% 2026-05-04
FinChain Finance 9 57.07 ChainEval 2026-05-28
MortgageTax Finance 28 65.454% 2026-05-28
PRBench Finance Finance 3 51.32 2026-05-06
TaxEval v2 Finance 31 73.385% 2026-05-28
MageBench Season 1 Game 30 1536 rating / 9 games 2026-05-28
Xent Games Game 3 62.77 overall 2026-05-28
BenchLM General Knowledge 22 78 2026-05-06
BenchLM General Knowledge 30 72 2026-05-06
GDPval Generalization 2 39.0% 2025-09-25
HELM AIR-Bench Generalization 7 0.876712 2026-05-28
GeoRC Geospatial 9 40.56 2026-05-27
HELM MedQA Healthcare 1 0.968191 2026-05-28
MedCode Healthcare 11 49.634% 2026-05-28
MedQA Healthcare 4 96.317% 2026-04-16
MedScribe Healthcare 12 83.65% 2026-05-28
Omi SOAP Note Safety Benchmark Healthcare 6 4.29 2026-04-21
HUMAINE Human Preference 16 3.61 2026-05-06
AIIQ Composite IQ Intelligence 13 119 2026-05-12
Artificial Analysis Intelligence Index Intelligence 44 44.63 2026-05-11
Artificial Analysis Intelligence Index Intelligence 60 42.03 2026-05-11
Artificial Analysis Intelligence Index Intelligence 80 39.2 2026-05-11
Artificial Analysis Intelligence Index Intelligence 206 23.89 2026-05-11
Artificial Analysis Intelligence Index Intelligence 227 21.83 2026-05-11
GPQA Diamond Intelligence 24 85.606% 2026-05-28
Humanity's Last Exam Intelligence 38 26.5% 2026-05-11
Humanity's Last Exam Intelligence 53 23.5% 2026-05-11
Humanity's Last Exam Intelligence 79 18.4% 2026-05-11
Humanity's Last Exam Intelligence 251 5.8% 2026-05-11
Humanity's Last Exam Intelligence 266 5.4% 2026-05-11
MathVision Intelligence 24 72 2026-05-06
MathVision Intelligence 68 45.80 2026-05-06
MMLU Pro Intelligence 23 86.544% 2026-05-28
MMLU-Pro Intelligence 12 87.1% 2026-05-11
MMLU-Pro Intelligence 14 86.7% 2026-05-11
MMLU-Pro Intelligence 22 86% 2026-05-11
MMLU-Pro Intelligence 75 82% 2026-05-11
MMLU-Pro Intelligence 105 80.6% 2026-05-11
MMMU Pro Intelligence 22 81.503% 2026-05-28
OCRBench v2 Intelligence 12 55.50 2026-05-06
TableBench Intelligence 6 59.94% 2026-05-27
AraGen v3 Language 2 84.25 2026-05-06
Seneca-TRBench Language 1 93.50 2026-05-06
CaseLaw v2 Legal 6 66.452% 2026-05-04
LegalBench Legal 6 86.023% 2026-05-28
LEXam Legal 1 70.20% open / 62.65% MCQ 2026-05-28
Professional Reasoning Bench - Legal Legal 5 48.96 2026-05-06
ConStory-Bench Long Context 1 CED 0.113 2026-05-28
OpenAI-MRCR: 2 needle 128k Long Context 1 0.95 2026-05-06
AIME Math 14 93.374% 2026-04-16
AIME 2025 Math 12 94.3% 2026-05-11
AIME 2025 Math 18 91.7% 2026-05-11
AIME 2025 Math 55 83% 2026-05-11
AIME 2025 Math 143 48.3% 2026-05-11
AIME 2025 Math 180 31.7% 2026-05-11
IneqMath Math 1 47 2026-05-06
IneqMath Math 8 28 2026-05-06
MATH 500 Math 3 96% 2026-01-09
MGSM Math 14 92.836% 2026-01-09
ProofBench Math 16 18% 2026-05-28
HMMT 2025 Mathematics 11 0.93 2026-05-06
LiveMedBench Medical 3 0.2858 2026-05-27
AfroBench-Lite Multilingual 1 77.74 2026-05-06
ALL Bench Multimodal Multimodal 8 8.42 2026-05-06
CharXiv-R Multimodal 8 0.81 2026-05-06
Design Arena Multimodal 44 1227 2026-05-06
Design Arena Multimodal 51 1223 2026-05-06
Math-VR Multimodal 7 58.1 2026-05-27
MMMU-Pro Multimodal 11 78.40 2026-05-06
MMMU-Pro Multimodal 30 62.70 2026-05-06
MMSI-Bench Multimodal 4 41.9% 2026-05-28
Physical AI Bench Understanding Multimodal 2 69.80 2026-05-06
VideoMME w sub. Multimodal 4 0.87 2026-05-06
VideoMMMU Multimodal 6 0.85 2026-05-06
Visual-Language Understanding Multimodal 7 49.69 2026-05-06
VTB Multimodal 5 18.68 2026-05-06
VTB Multimodal 6 16.96 2026-05-06
WebMainBench Multimodal 2 0.90 2026-05-06
Artificial Analysis Openness Index Openness 201 11.11 2026-05-11
Artificial Analysis Openness Index Openness 217 5.56 2026-05-11
Artificial Analysis Openness Index Openness 218 5.56 2026-05-11
Artificial Analysis Openness Index Openness 219 5.56 2026-05-11
Artificial Analysis Openness Index Openness 220 5.56 2026-05-11
CAIS Text Capabilities Index Reasoning 21 20.9 2026-05-27
EnigmaEval Reasoning 6 10.47 2026-05-06
ERQA Reasoning 1 0.66 2026-05-06
GPQA Diamond Reasoning 51 85.4% 2026-05-11
GPQA Diamond Reasoning 62 84.2% 2026-05-11
GPQA Diamond Reasoning 98 80.8% 2026-05-11
GPQA Diamond Reasoning 217 68.6% 2026-05-11
GPQA Diamond Reasoning 227 67.3% 2026-05-11
Graphwalks BFS <128k Reasoning 3 0.78 2026-05-06
Graphwalks parents <128k Reasoning 3 0.73 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 8 26.32 2026-05-06
LingOly-TOO Reasoning 1 0.47 2026-05-06
MultiNRC Reasoning 6 52.13 2026-05-06
CAIS Risk Index Safety 13 46.9 2026-05-27
ThaiSafetyBench Safety 1 4.43% overall ASR 2026-05-28
CritPt Science 30 5.7% 2026-05-11
CritPt Science 82 1.1% 2026-05-11
CritPt Science 218 0% 2026-05-11
CritPt Science 219 0% 2026-05-11
BrowseComp Long Context 128k Search 2 0.90 2026-05-06
BrowseComp Long Context 256k Search 2 0.89 2026-05-06
SWT-Bench Software Engineering 4 79.8% 2026-05-27
Structured Output Benchmark Structured Output 16 84.90 2026-05-06
LiveSQLBench Text to SQL 11 31.15 2026-05-06
COLLIE Writing 1 0.99 2026-05-06