GPT-5.1

GPT / OpenAI

105scores
74benchmarks
$1.25 / $10 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-5.1, gpt-5.1-20251113, openai-gpt-5.1, openai-gpt-5.1-20251113, openai/gpt-5.1, openai/gpt-5.1-20251113

Benchmark Results

Benchmark Category Rank Score Sampled
ADBench Agentic 4 82 2026-05-06
ALFWorld Agentic 6 0.917 2026-05-27
APEX-Agents Agentic 19 31.50 2026-05-06
ARC-AGI-1 Agentic 39 72.83 2026-05-05
ARC-AGI-1 Agentic 55 57.67 2026-05-05
ARC-AGI-1 Agentic 88 33.17 2026-05-05
ARC-AGI-1 Agentic 132 5.83 2026-05-05
ARC-AGI-2 Agentic 43 17.64 2026-05-05
ARC-AGI-2 Agentic 59 6.53 2026-05-05
ARC-AGI-2 Agentic 98 1.94 2026-05-05
ARC-AGI-2 Agentic 128 0.42 2026-05-05
DEEPSYNTH Agentic 6 3.83 2026-05-27
Gert Labs Rankings Agentic 47 0.37 2026-05-11
LMArena Search Arena Agentic 12 1201.06 2026-05-06
MCP Atlas Agentic 18 50.10 2026-05-06
MultiChallenge Agentic 4 63.41 2026-05-06
MultiChallenge Agentic 19 51.23 2026-05-06
Poker Agent Agentic 9 1038.593% 2025-12-23
Tau2 Airline Agentic 3 0.67 2026-05-06
Tau2 Airline Agentic 3 0.67 2026-05-06
Tau2 Airline Agentic 3 0.67 2026-05-06
Tau2-Bench Telecom Agentic 101 81.9% 2026-05-11
Tau2-Bench Telecom Agentic 187 46.5% 2026-05-11
Terminal-Bench Hard Agentic 20 45.5% 2026-05-11
Terminal-Bench Hard Agentic 132 22.7% 2026-05-11
Vending-Bench 2 Agentic 24 1473.43 2026-05-28
OpenUGI Alignment 241 44.28 2026-05-06
OpenUGI Alignment 248 44.15 2026-05-06
OpenUGI Alignment 566 34.91 2026-05-06
scBench Biology 13 38.80% 2026-05-27
SpatialBench Biology 13 39.83% 2026-05-27
ALE-Bench Coding 15 1192.15 2026-05-06
Arena AI Code Coding 35 1391 2026-05-06
Arena AI Code Coding 48 1340 2026-05-06
IOI Coding 13 21.5% 2026-05-26
LiveCodeBench Coding 10 86.486% 2026-05-28
SciCode Coding 55 43.3% 2026-05-11
SciCode Coding 167 36.5% 2026-05-11
SWE-bench Verified Coding 32 69.8% 2026-05-28
Terminal-Bench 2.0 Coding 25 44.944% 2026-05-28
Vibe Code Bench v1.1 Coding 19 24.606% 2026-05-28
Arena AI Document Document AI 22 1410 2026-05-06
GSMA Open Telco Leaderboard Domain 32 60.15 2026-05-06
SAGE Education 24 43.235% 2026-05-28
TutorBench Education 2 54.09 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 67 89.10 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 76 87.90 2026-05-06
CorpFin v2 Finance 23 63.831% 2026-05-28
Finance Agent v1.1 Finance 13 55.309% 2026-05-04
MortgageTax Finance 39 61.368% 2026-05-28
PRBench Finance Finance 6 48.01 2026-05-06
QuantSightBench Finance 4 0.7459 coverage 2026-05-28
TaxEval v2 Finance 13 74.857% 2026-05-28
ALL Bench LLM General Knowledge 31 22.51 2026-05-06
BenchLM General Knowledge 21 79 2026-05-06
HELM AIR-Bench Generalization 9 0.861872 2026-05-28
MedCode Healthcare 6 52.732% 2026-05-28
MedQA Healthcare 2 96.383% 2026-04-16
MedScribe Healthcare 1 88.09% 2026-05-28
AIIQ Composite IQ Intelligence 12 120 2026-05-12
Artificial Analysis Intelligence Index Intelligence 33 47.7 2026-05-11
Artificial Analysis Intelligence Index Intelligence 169 27.42 2026-05-11
GPQA Diamond Intelligence 21 86.616% 2026-05-28
Humanity's Last Exam Intelligence 39 26.5% 2026-05-11
Humanity's Last Exam Intelligence 278 5.2% 2026-05-11
LiveBench Intelligence 21 72.61 2026-05-05
LiveBench Intelligence 33 69.14 2026-05-05
MMLU Pro Intelligence 24 86.377% 2026-05-28
MMLU-Pro Intelligence 13 87% 2026-05-11
MMLU-Pro Intelligence 113 80.1% 2026-05-11
MMMU Pro Intelligence 17 83.179% 2026-05-28
CaseLaw v2 Legal 2 73.419% 2026-05-04
LegalBench Legal 7 85.683% 2026-05-28
Professional Reasoning Bench - Legal Legal 3 49.33 2026-05-06
AIME Math 15 93.333% 2026-04-16
AIME 2025 Math 14 94% 2026-05-11
AIME 2025 Math 165 38% 2026-05-11
MGSM Math 13 92.982% 2026-01-09
LiveMedBench Medical 2 0.3845 2026-05-27
Medmarks Medical 2 0.6243980841829406 2026-05-27
Medmarks Medical 2 0.6395161191059014 2026-05-27
ALL Bench Multimodal Multimodal 30 21.43 2026-05-06
Design Arena Multimodal 39 1230 2026-05-06
Design Arena Multimodal 52 1220 2026-05-06
Design Arena Multimodal 55 1215 2026-05-06
Design Arena Multimodal 58 1209 2026-05-06
LMArena Vision Arena Multimodal 24 1248.67 2026-05-06
MMMU-Pro Multimodal 10 79 2026-05-06
MMMU-Pro Multimodal 15 76 2026-05-06
Visual-Language Understanding Multimodal 26 43.82 2026-05-06
Artificial Analysis Openness Index Openness 202 11.11 2026-05-11
Artificial Analysis Openness Index Openness 228 5.56 2026-05-11
CAIS Text Capabilities Index Reasoning 16 29.0 2026-05-27
EnigmaEval Reasoning 6 11.23 2026-05-06
GPQA Diamond Reasoning 32 87.3% 2026-05-11
GPQA Diamond Reasoning 251 64.3% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 9 24.65 2026-05-06
MultiNRC Reasoning 7 49 2026-05-06
CAIS Risk Index Safety 12 46.4 2026-05-27
CritPt Science 35 4.9% 2026-05-11
CritPt Science 225 0% 2026-05-11
BrowseComp Long Context 128k Search 2 0.90 2026-05-06
BrowseComp Long Context 128k Search 2 0.90 2026-05-06
BrowseComp Long Context 128k Search 2 0.90 2026-05-06
CAIS Vision Capabilities Index Vision 13 53.2 2026-05-27