GPT-5.2
GPT / OpenAI
160scores
111benchmarks
$1.75 / $14 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5.2, gpt-5.2-20251211, openai-gpt-5.2, openai-gpt-5.2-20251211, openai/gpt-5.2, openai/gpt-5.2-20251211
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AMA-Bench | Agentic | 1 | 0.71 | 2026-05-06 |
| APEX-Agents | Agentic | 5 | 48.40 | 2026-05-06 |
| APEX-Agents | Agentic | 12 | 38.70 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 26 | 86.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 35 | 78.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 40 | 72.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 61 | 55.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 125 | 12.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 28 | 52.91 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 30 | 43.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 39 | 26.67 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 51 | 9.72 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 122 | 0.83 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 16 | 55.87% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 38 | 45.27% | 2026-05-27 |
| CAR-bench | Agentic | 3 | 0.53 | 2026-05-06 |
| Clembench Text v3.0 | Agentic | 4 | 84.19 | 2026-05-06 |
| Clembench Text v3.0 | Agentic | 6 | 81.66 | 2026-05-06 |
| Clembench Text v3.0 | Agentic | 7 | 79.61 | 2026-05-06 |
| Clembench Text v3.0 | Agentic | 10 | 74.27 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 6 | 31.3% | 2026-05-05 |
| EnterpriseOps-Gym | Agentic | 17 | 21.1% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 23 | 0.51 | 2026-05-11 |
| Hindsight LLM Memory Leaderboard | Agentic | 16 | 83.50 | 2026-05-06 |
| LLM-WikiRace | Agentic | 9 | 50.70 | 2026-05-06 |
| LMArena Search Arena | Agentic | 6 | 1212.66 | 2026-05-06 |
| LMArena Search Arena | Agentic | 15 | 1177.64 | 2026-05-06 |
| MCP Atlas | Agentic | 7 | 67.60 | 2026-05-06 |
| MCPMark | Agentic | 1 | 0.57 | 2026-05-06 |
| Poker Agent | Agentic | 1 | 1131.833% | 2025-12-23 |
| Tau2-Bench Telecom | Agentic | 88 | 84.8% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 122 | 74.3% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 188 | 46.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 16 | 47% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 27 | 43.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 86 | 31.8% | 2026-05-11 |
| Toolathlon | Agentic | 7 | 0.46 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 20 | 3591.33 | 2026-05-28 |
| VitaBench | Agentic | 6 | 24.30 | 2026-05-06 |
| VitaBench | Agentic | 26 | 0.80 | 2026-05-06 |
| OpenUGI | Alignment | 780 | 31.02 | 2026-05-06 |
| OpenUGI | Alignment | 809 | 30.40 | 2026-05-06 |
| OpenUGI | Alignment | 873 | 28.73 | 2026-05-06 |
| OpenUGI | Alignment | 1089 | 19.28 | 2026-05-06 |
| scBench | Biology | 8 | 52.31% | 2026-05-27 |
| SpatialBench | Biology | 8 | 50.1% | 2026-05-27 |
| ALE-Bench | Coding | 10 | 1293.55 | 2026-05-06 |
| ALE-Bench | Coding | 11 | 1249.83 | 2026-05-06 |
| Arena AI Code | Coding | 29 | 1404 | 2026-05-06 |
| HoudiniVexBench | Coding | 3 | 0.49 | 2026-05-06 |
| IOI | Coding | 2 | 54.833% | 2026-05-26 |
| LiveCodeBench | Coding | 16 | 85.361% | 2026-05-28 |
| SciCode | Coding | 11 | 52.1% | 2026-05-11 |
| SciCode | Coding | 37 | 46.2% | 2026-05-11 |
| SciCode | Coding | 91 | 40.4% | 2026-05-11 |
| SWE-bench Verified | Coding | 15 | 75.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 20 | 51.685% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 7 | 53.499% | 2026-05-28 |
| VibeCodingBench | Coding | 4 | 88.75 | 2026-05-06 |
| SecCodeBench | Cybersecurity | 11 | 58.23% | 2026-05-28 |
| DAXBench | Data | 39 | 78.4% | 2026-05-28 |
| Arena AI Document | Document AI | 21 | 1414 | 2026-05-06 |
| Arena AI Document | Document AI | 23 | 1406 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 25 | 63.15 | 2026-05-06 |
| IB-bench | Domain Specific | 3 | 9.20 | 2026-05-06 |
| SAGE | Education | 13 | 49.27% | 2026-05-28 |
| TutorBench | Education | 2 | 53.49 | 2026-05-06 |
| From Perception to Action | Embodied AI | 1 | 22.9% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 42 | 91.60 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 64 | 89.20 | 2026-05-06 |
| CorpFin v2 | Finance | 14 | 65.889% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 9 | 58.535% | 2026-05-04 |
| MortgageTax | Finance | 20 | 67.13% | 2026-05-28 |
| TaxBench | Finance | 16 | 4.60% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 5 | 75.756% | 2026-05-28 |
| MageBench Season 1 | Game | 2 | 1737 rating / 11 games | 2026-05-28 |
| MageBench Season 1 | Game | 27 | 1547 rating / 13 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 3 | 59.48 | 2026-05-06 |
| BenchLM | General Knowledge | 19 | 81 | 2026-05-06 |
| WeirdML | Generalization | 1 | 72.20 | 2026-05-06 |
| MedCode | Healthcare | 10 | 49.749% | 2026-05-28 |
| MedQA | Healthcare | 19 | 94.133% | 2026-04-16 |
| MedScribe | Healthcare | 10 | 84.387% | 2026-05-28 |
| Omi SOAP Note Safety Benchmark | Healthcare | 1 | 4.72 | 2026-04-21 |
| PlaceboBench | Healthcare | 2 | 63.2353 | 2026-05-27 |
| AIIQ Composite IQ | Intelligence | 7 | 126 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 18 | 51.28 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 36 | 46.64 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 117 | 33.57 | 2026-05-11 |
| GPQA Diamond | Intelligence | 6 | 91.666% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 13 | 35.4% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 50 | 24.9% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 201 | 7.3% | 2026-05-11 |
| LiveBench | Intelligence | 11 | 75.38 | 2026-05-05 |
| LiveBench | Intelligence | 20 | 72.62 | 2026-05-05 |
| LiveBench | Intelligence | 43 | 65.59 | 2026-05-05 |
| MathVision | Intelligence | 16 | 83 | 2026-05-06 |
| MMLU Pro | Intelligence | 27 | 86.234% | 2026-05-28 |
| MMLU-Pro | Intelligence | 10 | 87.4% | 2026-05-11 |
| MMLU-Pro | Intelligence | 24 | 85.9% | 2026-05-11 |
| MMLU-Pro | Intelligence | 88 | 81.4% | 2026-05-11 |
| MMMU Pro | Intelligence | 8 | 86.667% | 2026-05-28 |
| OCRBench v2 | Intelligence | 15 | 50.50 | 2026-05-06 |
| OCRBench v2 | Intelligence | 18 | 52.60 | 2026-05-06 |
| CaseLaw v2 | Legal | 8 | 66.024% | 2026-05-04 |
| LegalBench | Legal | 36 | 82.764% | 2026-05-28 |
| Fiction.LiveBench | Long Context | 3 | 96.90 | 2026-05-06 |
| AIME | Math | 3 | 96.875% | 2026-04-16 |
| AIME 2025 | Math | 1 | 99% | 2026-05-11 |
| AIME 2025 | Math | 5 | 96.7% | 2026-05-11 |
| AIME 2025 | Math | 138 | 51% | 2026-05-11 |
| FrontierMath | Math | 1 | 40.3 | 2026-05-27 |
| MGSM | Math | 6 | 94% | 2026-01-09 |
| ProofBench | Math | 20 | 15% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 1 | 40.70 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 1 | 29.20 | 2026-05-06 |
| HMMT 2025 | Mathematics | 2 | 0.99 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 1 | 96.11 | 2026-05-06 |
| LiveMedBench | Medical | 1 | 0.3923 | 2026-05-27 |
| Medmarks | Medical | 1 | 0.6389159522138381 | 2026-05-27 |
| Medmarks | Medical | 5 | 0.6236362195525137 | 2026-05-27 |
| MedSafe-Dx | Medical | 1 | 97.6 | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 3 | 62.59 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 6 | 8.67 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 3 | 32.88 | 2026-05-06 |
| CharXiv-R | Multimodal | 5 | 0.82 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 7 | 81.49 | 2026-05-06 |
| JMMMU-Pro | Multimodal | 2 | 83.33 | 2026-05-06 |
| MMMU-Pro | Multimodal | 7 | 80.40 | 2026-05-06 |
| MMMU-Pro | Multimodal | 9 | 79.50 | 2026-05-06 |
| VideoMMMU | Multimodal | 4 | 0.86 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 13 | 46.62 | 2026-05-06 |
| VPCT | Multimodal | 2 | 84 | 2026-05-06 |
| ARC-AGI v2 | Reasoning | 7 | 0.53 | 2026-05-06 |
| Balrog | Reasoning | 4 | 32.80 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 10 | 33.8 | 2026-05-27 |
| EnigmaEval | Reasoning | 6 | 10.39 | 2026-05-06 |
| FINAL Bench Metacognitive | Reasoning | 3 | 76.50 | 2026-05-06 |
| GPQA Diamond | Reasoning | 13 | 90.3% | 2026-05-11 |
| GPQA Diamond | Reasoning | 41 | 86.4% | 2026-05-11 |
| GPQA Diamond | Reasoning | 194 | 71.2% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 1 | 0.94 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 2 | 0.89 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 8 | 28.50 | 2026-05-06 |
| MultiNRC | Reasoning | 12 | 42.18 | 2026-05-06 |
| SimpleBench | Reasoning | 4 | 61.60 | 2026-05-06 |
| CAIS Risk Index | Safety | 9 | 42.6 | 2026-05-27 |
| LiveSecBench | Safety | 3 | 84.72 | 2026-05-27 |
| CritPt | Science | 13 | 11.6% | 2026-05-11 |
| CritPt | Science | 24 | 7.9% | 2026-05-11 |
| CritPt | Science | 109 | 0.6% | 2026-05-11 |
| GSO-Bench | Science | 1 | 27.40 | 2026-05-06 |
| SciPredict | Science | 4 | 20.58 | 2026-05-06 |
| BrowseComp Long Context 128k | Search | 1 | 0.92 | 2026-05-06 |
| BrowseComp Long Context 256k | Search | 1 | 0.90 | 2026-05-06 |
| IDE-Bench | Software Engineering | 2 | 85 | 2026-05-27 |
| CAIS Vision Capabilities Index | Vision | 9 | 55.0 | 2026-05-27 |
| K-MetBench | Weather | 2 | 87.8% accuracy | 2026-05-28 |
| K-MetBench | Weather | 9 | 77.6% accuracy | 2026-05-28 |
| Lech Mazur Writing | Writing | 1 | 8.72 | 2026-05-06 |
No matching rows.