GPT-5.2

GPT / OpenAI

160scores
111benchmarks
$1.75 / $14 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5.2, gpt-5.2-20251211, openai-gpt-5.2, openai-gpt-5.2-20251211, openai/gpt-5.2, openai/gpt-5.2-20251211

Benchmark Results

Benchmark Category Rank Score Sampled
AMA-Bench Agentic 1 0.71 2026-05-06
APEX-Agents Agentic 5 48.40 2026-05-06
APEX-Agents Agentic 12 38.70 2026-05-06
ARC-AGI-1 Agentic 26 86.17 2026-05-05
ARC-AGI-1 Agentic 35 78.67 2026-05-05
ARC-AGI-1 Agentic 40 72.67 2026-05-05
ARC-AGI-1 Agentic 61 55.67 2026-05-05
ARC-AGI-1 Agentic 125 12.33 2026-05-05
ARC-AGI-2 Agentic 28 52.91 2026-05-05
ARC-AGI-2 Agentic 30 43.33 2026-05-05
ARC-AGI-2 Agentic 39 26.67 2026-05-05
ARC-AGI-2 Agentic 51 9.72 2026-05-05
ARC-AGI-2 Agentic 122 0.83 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 16 55.87% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 38 45.27% 2026-05-27
CAR-bench Agentic 3 0.53 2026-05-06
Clembench Text v3.0 Agentic 4 84.19 2026-05-06
Clembench Text v3.0 Agentic 6 81.66 2026-05-06
Clembench Text v3.0 Agentic 7 79.61 2026-05-06
Clembench Text v3.0 Agentic 10 74.27 2026-05-06
EnterpriseOps-Gym Agentic 6 31.3% 2026-05-05
EnterpriseOps-Gym Agentic 17 21.1% 2026-05-05
Gert Labs Rankings Agentic 23 0.51 2026-05-11
Hindsight LLM Memory Leaderboard Agentic 16 83.50 2026-05-06
LLM-WikiRace Agentic 9 50.70 2026-05-06
LMArena Search Arena Agentic 6 1212.66 2026-05-06
LMArena Search Arena Agentic 15 1177.64 2026-05-06
MCP Atlas Agentic 7 67.60 2026-05-06
MCPMark Agentic 1 0.57 2026-05-06
Poker Agent Agentic 1 1131.833% 2025-12-23
Tau2-Bench Telecom Agentic 88 84.8% 2026-05-11
Tau2-Bench Telecom Agentic 122 74.3% 2026-05-11
Tau2-Bench Telecom Agentic 188 46.5% 2026-05-11
Terminal-Bench Hard Agentic 16 47% 2026-05-11
Terminal-Bench Hard Agentic 27 43.2% 2026-05-11
Terminal-Bench Hard Agentic 86 31.8% 2026-05-11
Toolathlon Agentic 7 0.46 2026-05-06
Vending-Bench 2 Agentic 20 3591.33 2026-05-28
VitaBench Agentic 6 24.30 2026-05-06
VitaBench Agentic 26 0.80 2026-05-06
OpenUGI Alignment 780 31.02 2026-05-06
OpenUGI Alignment 809 30.40 2026-05-06
OpenUGI Alignment 873 28.73 2026-05-06
OpenUGI Alignment 1089 19.28 2026-05-06
scBench Biology 8 52.31% 2026-05-27
SpatialBench Biology 8 50.1% 2026-05-27
ALE-Bench Coding 10 1293.55 2026-05-06
ALE-Bench Coding 11 1249.83 2026-05-06
Arena AI Code Coding 29 1404 2026-05-06
HoudiniVexBench Coding 3 0.49 2026-05-06
IOI Coding 2 54.833% 2026-05-26
LiveCodeBench Coding 16 85.361% 2026-05-28
SciCode Coding 11 52.1% 2026-05-11
SciCode Coding 37 46.2% 2026-05-11
SciCode Coding 91 40.4% 2026-05-11
SWE-bench Verified Coding 15 75.8% 2026-05-28
Terminal-Bench 2.0 Coding 20 51.685% 2026-05-28
Vibe Code Bench v1.1 Coding 7 53.499% 2026-05-28
VibeCodingBench Coding 4 88.75 2026-05-06
SecCodeBench Cybersecurity 11 58.23% 2026-05-28
DAXBench Data 39 78.4% 2026-05-28
Arena AI Document Document AI 21 1414 2026-05-06
Arena AI Document Document AI 23 1406 2026-05-06
GSMA Open Telco Leaderboard Domain 25 63.15 2026-05-06
IB-bench Domain Specific 3 9.20 2026-05-06
SAGE Education 13 49.27% 2026-05-28
TutorBench Education 2 53.49 2026-05-06
From Perception to Action Embodied AI 1 22.9% 2026-05-28
Vectara HHEM Hallucination Leaderboard Factuality 42 91.60 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 64 89.20 2026-05-06
CorpFin v2 Finance 14 65.889% 2026-05-28
Finance Agent v1.1 Finance 9 58.535% 2026-05-04
MortgageTax Finance 20 67.13% 2026-05-28
TaxBench Finance 16 4.60% mean pass^5 2026-05-27
TaxEval v2 Finance 5 75.756% 2026-05-28
MageBench Season 1 Game 2 1737 rating / 11 games 2026-05-28
MageBench Season 1 Game 27 1547 rating / 13 games 2026-05-28
ALL Bench LLM General Knowledge 3 59.48 2026-05-06
BenchLM General Knowledge 19 81 2026-05-06
WeirdML Generalization 1 72.20 2026-05-06
MedCode Healthcare 10 49.749% 2026-05-28
MedQA Healthcare 19 94.133% 2026-04-16
MedScribe Healthcare 10 84.387% 2026-05-28
Omi SOAP Note Safety Benchmark Healthcare 1 4.72 2026-04-21
PlaceboBench Healthcare 2 63.2353 2026-05-27
AIIQ Composite IQ Intelligence 7 126 2026-05-12
Artificial Analysis Intelligence Index Intelligence 18 51.28 2026-05-11
Artificial Analysis Intelligence Index Intelligence 36 46.64 2026-05-11
Artificial Analysis Intelligence Index Intelligence 117 33.57 2026-05-11
GPQA Diamond Intelligence 6 91.666% 2026-05-28
Humanity's Last Exam Intelligence 13 35.4% 2026-05-11
Humanity's Last Exam Intelligence 50 24.9% 2026-05-11
Humanity's Last Exam Intelligence 201 7.3% 2026-05-11
LiveBench Intelligence 11 75.38 2026-05-05
LiveBench Intelligence 20 72.62 2026-05-05
LiveBench Intelligence 43 65.59 2026-05-05
MathVision Intelligence 16 83 2026-05-06
MMLU Pro Intelligence 27 86.234% 2026-05-28
MMLU-Pro Intelligence 10 87.4% 2026-05-11
MMLU-Pro Intelligence 24 85.9% 2026-05-11
MMLU-Pro Intelligence 88 81.4% 2026-05-11
MMMU Pro Intelligence 8 86.667% 2026-05-28
OCRBench v2 Intelligence 15 50.50 2026-05-06
OCRBench v2 Intelligence 18 52.60 2026-05-06
CaseLaw v2 Legal 8 66.024% 2026-05-04
LegalBench Legal 36 82.764% 2026-05-28
Fiction.LiveBench Long Context 3 96.90 2026-05-06
AIME Math 3 96.875% 2026-04-16
AIME 2025 Math 1 99% 2026-05-11
AIME 2025 Math 5 96.7% 2026-05-11
AIME 2025 Math 138 51% 2026-05-11
FrontierMath Math 1 40.3 2026-05-27
MGSM Math 6 94% 2026-01-09
ProofBench Math 20 15% 2026-05-28
FrontierMath 2025-02-28 Private Mathematics 1 40.70 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 1 29.20 2026-05-06
HMMT 2025 Mathematics 2 0.99 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 1 96.11 2026-05-06
LiveMedBench Medical 1 0.3923 2026-05-27
Medmarks Medical 1 0.6389159522138381 2026-05-27
Medmarks Medical 5 0.6236362195525137 2026-05-27
MedSafe-Dx Medical 1 97.6 2026-05-27
ALL Bench Multimodal Multimodal 3 62.59 2026-05-06
ALL Bench Multimodal Multimodal 6 8.67 2026-05-06
ALL Bench Multimodal Multimodal 3 32.88 2026-05-06
CharXiv-R Multimodal 5 0.82 2026-05-06
IDP Leaderboard Multimodal 7 81.49 2026-05-06
JMMMU-Pro Multimodal 2 83.33 2026-05-06
MMMU-Pro Multimodal 7 80.40 2026-05-06
MMMU-Pro Multimodal 9 79.50 2026-05-06
VideoMMMU Multimodal 4 0.86 2026-05-06
Visual-Language Understanding Multimodal 13 46.62 2026-05-06
VPCT Multimodal 2 84 2026-05-06
ARC-AGI v2 Reasoning 7 0.53 2026-05-06
Balrog Reasoning 4 32.80 2026-05-06
CAIS Text Capabilities Index Reasoning 10 33.8 2026-05-27
EnigmaEval Reasoning 6 10.39 2026-05-06
FINAL Bench Metacognitive Reasoning 3 76.50 2026-05-06
GPQA Diamond Reasoning 13 90.3% 2026-05-11
GPQA Diamond Reasoning 41 86.4% 2026-05-11
GPQA Diamond Reasoning 194 71.2% 2026-05-11
Graphwalks BFS <128k Reasoning 1 0.94 2026-05-06
Graphwalks parents <128k Reasoning 2 0.89 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 8 28.50 2026-05-06
MultiNRC Reasoning 12 42.18 2026-05-06
SimpleBench Reasoning 4 61.60 2026-05-06
CAIS Risk Index Safety 9 42.6 2026-05-27
LiveSecBench Safety 3 84.72 2026-05-27
CritPt Science 13 11.6% 2026-05-11
CritPt Science 24 7.9% 2026-05-11
CritPt Science 109 0.6% 2026-05-11
GSO-Bench Science 1 27.40 2026-05-06
SciPredict Science 4 20.58 2026-05-06
BrowseComp Long Context 128k Search 1 0.92 2026-05-06
BrowseComp Long Context 256k Search 1 0.90 2026-05-06
IDE-Bench Software Engineering 2 85 2026-05-27
CAIS Vision Capabilities Index Vision 9 55.0 2026-05-27
K-MetBench Weather 2 87.8% accuracy 2026-05-28
K-MetBench Weather 9 77.6% accuracy 2026-05-28
Lech Mazur Writing Writing 1 8.72 2026-05-06