GPT-5
GPT / OpenAI
173scores
115benchmarks
$1.25 / $10 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5, gpt-5-2025-08-07, openai-gpt-5, openai-gpt-5-2025-08-07, openai/gpt-5, openai/gpt-5-2025-08-07
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ALFWorld | Agentic | 4 | 0.933 | 2026-05-27 |
| APEX-Agents | Agentic | 18 | 33 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 45 | 65.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 59 | 56.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 72 | 44 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 130 | 6 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 50 | 9.86 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 55 | 7.49 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 96 | 1.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 97 | 1.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 144 | 0 | 2026-05-05 |
| CAR-bench | Agentic | 2 | 0.54 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 8 | 29.2% | 2026-05-05 |
| LLM-WikiRace | Agentic | 3 | 60 | 2026-05-06 |
| LMArena Search Arena | Agentic | 24 | 1133.24 | 2026-05-06 |
| MCP-Universe | Agentic | 1 | 44.16 | 2026-05-06 |
| MCP-Universe | Agentic | 2 | 43.72 | 2026-05-06 |
| MCPMark | Agentic | 3 | 0.53 | 2026-05-06 |
| MCPMark | Agentic | 4 | 0.52 | 2026-05-06 |
| MCPMark | Agentic | 6 | 0.47 | 2026-05-06 |
| MobileWorld | Agentic | 1 | 51.7% | 2026-05-27 |
| MultiChallenge | Agentic | 4 | 63.19 | 2026-05-06 |
| Poker Agent | Agentic | 2 | 1103.175% | 2025-12-23 |
| RealDataAgentBench | Agentic | 10 | 0.78 | 2026-04-28 |
| Tau2 Airline | Agentic | 8 | 0.63 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 76 | 86.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 87 | 84.8% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 92 | 84.2% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 147 | 67% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 386 | 0% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 43 | 37.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 80 | 32.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 112 | 26.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 153 | 18.2% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 190 | 12.9% | 2026-05-11 |
| AgentBench FC | Agents | 11 | 52.20 | 2026-05-06 |
| OpenUGI | Alignment | 309 | 42.03 | 2026-05-06 |
| OpenUGI | Alignment | 598 | 34.28 | 2026-05-06 |
| ABC-Bench | Coding | 3 | 49.4% +/- 1.9 | 2026-05-27 |
| ALE-Bench | Coding | 18 | 1162.45 | 2026-05-06 |
| ALE-Bench | Coding | 38 | 807.65 | 2026-05-06 |
| Arena AI Code | Coding | 33 | 1393 | 2026-05-06 |
| ArtifactsBench | Coding | 1 | 72.55 | 2026-05-06 |
| ContextBench | Coding | 2 | 47.20 | 2026-05-06 |
| IOI | Coding | 16 | 20% | 2026-05-26 |
| LiveCodeBench | Coding | 13 | 85.911% | 2026-05-28 |
| SciCode | Coding | 58 | 42.9% | 2026-05-11 |
| SciCode | Coding | 75 | 41.1% | 2026-05-11 |
| SciCode | Coding | 118 | 39.1% | 2026-05-11 |
| SciCode | Coding | 123 | 38.8% | 2026-05-11 |
| SciCode | Coding | 138 | 37.8% | 2026-05-11 |
| SWE-bench Verified | Coding | 35 | 69% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 39 | 37.079% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 25 | 20.088% | 2026-05-28 |
| RedSage-Bench | Cybersecurity | 1 | 88.68% | 2026-05-28 |
| MMTU | Data | 1 | 0.70 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 6 | 71.88 | 2026-05-06 |
| IslamicLegalBench | Domain | 1 | 67.65 | 2026-05-06 |
| SAGE | Education | 22 | 43.68% | 2026-05-28 |
| TutorBench | Education | 1 | 55.33 | 2026-05-06 |
| FActScore | Factuality | 2 | 0.01 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 88 | 85.30 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 89 | 84.90 | 2026-05-06 |
| CorpFin v2 | Finance | 39 | 61.072% | 2026-05-28 |
| Fin-RATE | Finance | 1 | 43.37% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 25 | 52.151% | 2026-05-04 |
| FinChain | Finance | 9 | 57.07 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 28 | 65.454% | 2026-05-28 |
| PRBench Finance | Finance | 3 | 51.32 | 2026-05-06 |
| TaxEval v2 | Finance | 31 | 73.385% | 2026-05-28 |
| MageBench Season 1 | Game | 30 | 1536 rating / 9 games | 2026-05-28 |
| Xent Games | Game | 3 | 62.77 overall | 2026-05-28 |
| BenchLM | General Knowledge | 22 | 78 | 2026-05-06 |
| BenchLM | General Knowledge | 30 | 72 | 2026-05-06 |
| GDPval | Generalization | 2 | 39.0% | 2025-09-25 |
| HELM AIR-Bench | Generalization | 7 | 0.876712 | 2026-05-28 |
| GeoRC | Geospatial | 9 | 40.56 | 2026-05-27 |
| HELM MedQA | Healthcare | 1 | 0.968191 | 2026-05-28 |
| MedCode | Healthcare | 11 | 49.634% | 2026-05-28 |
| MedQA | Healthcare | 4 | 96.317% | 2026-04-16 |
| MedScribe | Healthcare | 12 | 83.65% | 2026-05-28 |
| Omi SOAP Note Safety Benchmark | Healthcare | 6 | 4.29 | 2026-04-21 |
| HUMAINE | Human Preference | 16 | 3.61 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 13 | 119 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 44 | 44.63 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 60 | 42.03 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 80 | 39.2 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 206 | 23.89 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 227 | 21.83 | 2026-05-11 |
| GPQA Diamond | Intelligence | 24 | 85.606% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 38 | 26.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 53 | 23.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 79 | 18.4% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 251 | 5.8% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 266 | 5.4% | 2026-05-11 |
| MathVision | Intelligence | 24 | 72 | 2026-05-06 |
| MathVision | Intelligence | 68 | 45.80 | 2026-05-06 |
| MMLU Pro | Intelligence | 23 | 86.544% | 2026-05-28 |
| MMLU-Pro | Intelligence | 12 | 87.1% | 2026-05-11 |
| MMLU-Pro | Intelligence | 14 | 86.7% | 2026-05-11 |
| MMLU-Pro | Intelligence | 22 | 86% | 2026-05-11 |
| MMLU-Pro | Intelligence | 75 | 82% | 2026-05-11 |
| MMLU-Pro | Intelligence | 105 | 80.6% | 2026-05-11 |
| MMMU Pro | Intelligence | 22 | 81.503% | 2026-05-28 |
| OCRBench v2 | Intelligence | 12 | 55.50 | 2026-05-06 |
| TableBench | Intelligence | 6 | 59.94% | 2026-05-27 |
| AraGen v3 | Language | 2 | 84.25 | 2026-05-06 |
| Seneca-TRBench | Language | 1 | 93.50 | 2026-05-06 |
| CaseLaw v2 | Legal | 6 | 66.452% | 2026-05-04 |
| LegalBench | Legal | 6 | 86.023% | 2026-05-28 |
| LEXam | Legal | 1 | 70.20% open / 62.65% MCQ | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 5 | 48.96 | 2026-05-06 |
| ConStory-Bench | Long Context | 1 | CED 0.113 | 2026-05-28 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 1 | 0.95 | 2026-05-06 |
| AIME | Math | 14 | 93.374% | 2026-04-16 |
| AIME 2025 | Math | 12 | 94.3% | 2026-05-11 |
| AIME 2025 | Math | 18 | 91.7% | 2026-05-11 |
| AIME 2025 | Math | 55 | 83% | 2026-05-11 |
| AIME 2025 | Math | 143 | 48.3% | 2026-05-11 |
| AIME 2025 | Math | 180 | 31.7% | 2026-05-11 |
| IneqMath | Math | 1 | 47 | 2026-05-06 |
| IneqMath | Math | 8 | 28 | 2026-05-06 |
| MATH 500 | Math | 3 | 96% | 2026-01-09 |
| MGSM | Math | 14 | 92.836% | 2026-01-09 |
| ProofBench | Math | 16 | 18% | 2026-05-28 |
| HMMT 2025 | Mathematics | 11 | 0.93 | 2026-05-06 |
| LiveMedBench | Medical | 3 | 0.2858 | 2026-05-27 |
| AfroBench-Lite | Multilingual | 1 | 77.74 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 8 | 8.42 | 2026-05-06 |
| CharXiv-R | Multimodal | 8 | 0.81 | 2026-05-06 |
| Design Arena | Multimodal | 44 | 1227 | 2026-05-06 |
| Design Arena | Multimodal | 51 | 1223 | 2026-05-06 |
| Math-VR | Multimodal | 7 | 58.1 | 2026-05-27 |
| MMMU-Pro | Multimodal | 11 | 78.40 | 2026-05-06 |
| MMMU-Pro | Multimodal | 30 | 62.70 | 2026-05-06 |
| MMSI-Bench | Multimodal | 4 | 41.9% | 2026-05-28 |
| Physical AI Bench Understanding | Multimodal | 2 | 69.80 | 2026-05-06 |
| VideoMME w sub. | Multimodal | 4 | 0.87 | 2026-05-06 |
| VideoMMMU | Multimodal | 6 | 0.85 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 7 | 49.69 | 2026-05-06 |
| VTB | Multimodal | 5 | 18.68 | 2026-05-06 |
| VTB | Multimodal | 6 | 16.96 | 2026-05-06 |
| WebMainBench | Multimodal | 2 | 0.90 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 201 | 11.11 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 217 | 5.56 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 218 | 5.56 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 219 | 5.56 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 220 | 5.56 | 2026-05-11 |
| CAIS Text Capabilities Index | Reasoning | 21 | 20.9 | 2026-05-27 |
| EnigmaEval | Reasoning | 6 | 10.47 | 2026-05-06 |
| ERQA | Reasoning | 1 | 0.66 | 2026-05-06 |
| GPQA Diamond | Reasoning | 51 | 85.4% | 2026-05-11 |
| GPQA Diamond | Reasoning | 62 | 84.2% | 2026-05-11 |
| GPQA Diamond | Reasoning | 98 | 80.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 217 | 68.6% | 2026-05-11 |
| GPQA Diamond | Reasoning | 227 | 67.3% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 3 | 0.78 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 3 | 0.73 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 8 | 26.32 | 2026-05-06 |
| LingOly-TOO | Reasoning | 1 | 0.47 | 2026-05-06 |
| MultiNRC | Reasoning | 6 | 52.13 | 2026-05-06 |
| CAIS Risk Index | Safety | 13 | 46.9 | 2026-05-27 |
| ThaiSafetyBench | Safety | 1 | 4.43% overall ASR | 2026-05-28 |
| CritPt | Science | 30 | 5.7% | 2026-05-11 |
| CritPt | Science | 82 | 1.1% | 2026-05-11 |
| CritPt | Science | 218 | 0% | 2026-05-11 |
| CritPt | Science | 219 | 0% | 2026-05-11 |
| BrowseComp Long Context 128k | Search | 2 | 0.90 | 2026-05-06 |
| BrowseComp Long Context 256k | Search | 2 | 0.89 | 2026-05-06 |
| SWT-Bench | Software Engineering | 4 | 79.8% | 2026-05-27 |
| Structured Output Benchmark | Structured Output | 16 | 84.90 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 11 | 31.15 | 2026-05-06 |
| COLLIE | Writing | 1 | 0.99 | 2026-05-06 |
No matching rows.