GPT-5.1
GPT / OpenAI
105scores
74benchmarks
$1.25 / $10 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-5.1, gpt-5.1-20251113, openai-gpt-5.1, openai-gpt-5.1-20251113, openai/gpt-5.1, openai/gpt-5.1-20251113
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 4 | 82 | 2026-05-06 |
| ALFWorld | Agentic | 6 | 0.917 | 2026-05-27 |
| APEX-Agents | Agentic | 19 | 31.50 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 39 | 72.83 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 55 | 57.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 88 | 33.17 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 132 | 5.83 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 43 | 17.64 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 59 | 6.53 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 98 | 1.94 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 128 | 0.42 | 2026-05-05 |
| DEEPSYNTH | Agentic | 6 | 3.83 | 2026-05-27 |
| Gert Labs Rankings | Agentic | 47 | 0.37 | 2026-05-11 |
| LMArena Search Arena | Agentic | 12 | 1201.06 | 2026-05-06 |
| MCP Atlas | Agentic | 18 | 50.10 | 2026-05-06 |
| MultiChallenge | Agentic | 4 | 63.41 | 2026-05-06 |
| MultiChallenge | Agentic | 19 | 51.23 | 2026-05-06 |
| Poker Agent | Agentic | 9 | 1038.593% | 2025-12-23 |
| Tau2 Airline | Agentic | 3 | 0.67 | 2026-05-06 |
| Tau2 Airline | Agentic | 3 | 0.67 | 2026-05-06 |
| Tau2 Airline | Agentic | 3 | 0.67 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 101 | 81.9% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 187 | 46.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 20 | 45.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 132 | 22.7% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 24 | 1473.43 | 2026-05-28 |
| OpenUGI | Alignment | 241 | 44.28 | 2026-05-06 |
| OpenUGI | Alignment | 248 | 44.15 | 2026-05-06 |
| OpenUGI | Alignment | 566 | 34.91 | 2026-05-06 |
| scBench | Biology | 13 | 38.80% | 2026-05-27 |
| SpatialBench | Biology | 13 | 39.83% | 2026-05-27 |
| ALE-Bench | Coding | 15 | 1192.15 | 2026-05-06 |
| Arena AI Code | Coding | 35 | 1391 | 2026-05-06 |
| Arena AI Code | Coding | 48 | 1340 | 2026-05-06 |
| IOI | Coding | 13 | 21.5% | 2026-05-26 |
| LiveCodeBench | Coding | 10 | 86.486% | 2026-05-28 |
| SciCode | Coding | 55 | 43.3% | 2026-05-11 |
| SciCode | Coding | 167 | 36.5% | 2026-05-11 |
| SWE-bench Verified | Coding | 32 | 69.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 25 | 44.944% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 19 | 24.606% | 2026-05-28 |
| Arena AI Document | Document AI | 22 | 1410 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 32 | 60.15 | 2026-05-06 |
| SAGE | Education | 24 | 43.235% | 2026-05-28 |
| TutorBench | Education | 2 | 54.09 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 67 | 89.10 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 76 | 87.90 | 2026-05-06 |
| CorpFin v2 | Finance | 23 | 63.831% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 13 | 55.309% | 2026-05-04 |
| MortgageTax | Finance | 39 | 61.368% | 2026-05-28 |
| PRBench Finance | Finance | 6 | 48.01 | 2026-05-06 |
| QuantSightBench | Finance | 4 | 0.7459 coverage | 2026-05-28 |
| TaxEval v2 | Finance | 13 | 74.857% | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 31 | 22.51 | 2026-05-06 |
| BenchLM | General Knowledge | 21 | 79 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 9 | 0.861872 | 2026-05-28 |
| MedCode | Healthcare | 6 | 52.732% | 2026-05-28 |
| MedQA | Healthcare | 2 | 96.383% | 2026-04-16 |
| MedScribe | Healthcare | 1 | 88.09% | 2026-05-28 |
| AIIQ Composite IQ | Intelligence | 12 | 120 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 33 | 47.7 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 169 | 27.42 | 2026-05-11 |
| GPQA Diamond | Intelligence | 21 | 86.616% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 39 | 26.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 278 | 5.2% | 2026-05-11 |
| LiveBench | Intelligence | 21 | 72.61 | 2026-05-05 |
| LiveBench | Intelligence | 33 | 69.14 | 2026-05-05 |
| MMLU Pro | Intelligence | 24 | 86.377% | 2026-05-28 |
| MMLU-Pro | Intelligence | 13 | 87% | 2026-05-11 |
| MMLU-Pro | Intelligence | 113 | 80.1% | 2026-05-11 |
| MMMU Pro | Intelligence | 17 | 83.179% | 2026-05-28 |
| CaseLaw v2 | Legal | 2 | 73.419% | 2026-05-04 |
| LegalBench | Legal | 7 | 85.683% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 3 | 49.33 | 2026-05-06 |
| AIME | Math | 15 | 93.333% | 2026-04-16 |
| AIME 2025 | Math | 14 | 94% | 2026-05-11 |
| AIME 2025 | Math | 165 | 38% | 2026-05-11 |
| MGSM | Math | 13 | 92.982% | 2026-01-09 |
| LiveMedBench | Medical | 2 | 0.3845 | 2026-05-27 |
| Medmarks | Medical | 2 | 0.6243980841829406 | 2026-05-27 |
| Medmarks | Medical | 2 | 0.6395161191059014 | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 30 | 21.43 | 2026-05-06 |
| Design Arena | Multimodal | 39 | 1230 | 2026-05-06 |
| Design Arena | Multimodal | 52 | 1220 | 2026-05-06 |
| Design Arena | Multimodal | 55 | 1215 | 2026-05-06 |
| Design Arena | Multimodal | 58 | 1209 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 24 | 1248.67 | 2026-05-06 |
| MMMU-Pro | Multimodal | 10 | 79 | 2026-05-06 |
| MMMU-Pro | Multimodal | 15 | 76 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 26 | 43.82 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 202 | 11.11 | 2026-05-11 |
| Artificial Analysis Openness Index | Openness | 228 | 5.56 | 2026-05-11 |
| CAIS Text Capabilities Index | Reasoning | 16 | 29.0 | 2026-05-27 |
| EnigmaEval | Reasoning | 6 | 11.23 | 2026-05-06 |
| GPQA Diamond | Reasoning | 32 | 87.3% | 2026-05-11 |
| GPQA Diamond | Reasoning | 251 | 64.3% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 9 | 24.65 | 2026-05-06 |
| MultiNRC | Reasoning | 7 | 49 | 2026-05-06 |
| CAIS Risk Index | Safety | 12 | 46.4 | 2026-05-27 |
| CritPt | Science | 35 | 4.9% | 2026-05-11 |
| CritPt | Science | 225 | 0% | 2026-05-11 |
| BrowseComp Long Context 128k | Search | 2 | 0.90 | 2026-05-06 |
| BrowseComp Long Context 128k | Search | 2 | 0.90 | 2026-05-06 |
| BrowseComp Long Context 128k | Search | 2 | 0.90 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 13 | 53.2 | 2026-05-27 |
No matching rows.