GPT-4.1

GPT / OpenAI

107scores
103benchmarks
$2 / $8 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4.1, gpt-4.1-2025-04-14, openai-gpt-4.1, openai-gpt-4.1-2025-04-14, openai/gpt-4.1, openai/gpt-4.1-2025-04-14

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 133 5.50 2026-05-05
ARC-AGI-2 Agentic 126 0.42 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 20 53.96% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 45 39.38% 2026-05-27
CAR-bench Agentic 8 0.37 2026-05-06
DEEPSYNTH Agentic 8 3.46 2026-05-27
Galileo Agent Leaderboard Agentic 1 0.62 2026-05-06
Gert Labs Rankings Agentic 58 0.28 2026-05-11
MCP-Universe Agentic 24 18.18 2026-05-06
MCPMark Agentic 33 0.08 2026-05-06
MultiChallenge Agentic 27 39.43 2026-05-06
RealDataAgentBench Agentic 1 0.88 2026-04-28
Tau2-Bench Telecom Agentic 184 47.1% 2026-05-11
Terminal-Bench Hard Agentic 186 13.6% 2026-05-11
UAVBench Agentic 5 79.05 2026-05-06
OpenUGI Alignment 162 47.53 2026-05-06
TextClass Benchmark Classification 63 1520.39 2026-05-06
ALE-Bench Coding 66 558.10 2026-05-06
BigCodeBench-Hard Coding 7 31.80 2026-05-05
CadEval Coding 6 42 2026-05-06
LiveCodeBench Coding 84 54.666% 2026-05-28
SciCode Coding 136 38.1% 2026-05-11
Terminal-Bench 2.0 Coding 61 14.607% 2026-05-28
RP-Bench Creative 6 1522.70 2026-05-06
RP-Bench Creative 8 1509.40 2026-05-06
RP-Bench Creative 24 4.31 2026-05-06
GSMA Open Telco Leaderboard Domain 23 63.39 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 21 94.40 2026-05-06
CorpFin v2 Finance 28 63.054% 2026-05-28
Fin-RATE Finance 2 33.24% 2026-05-28
Fin-RATE Finance 3 31.80% 2026-05-28
FinChain Finance 11 56.92 ChainEval 2026-05-28
MortgageTax Finance 24 65.938% 2026-05-28
PRBench Finance Finance 24 34.32 2026-05-06
TaxEval v2 Finance 11 75.061% 2026-05-28
BenchLM General Knowledge 51 58 2026-05-06
Arena-Hard Generalization 14 50.0% 2026-05-27
HELM AIR-Bench Generalization 47 0.647875 2026-05-28
HELM Safety Generalization 11 0.962853 2026-05-28
WeirdML Generalization 16 39.37 2026-05-06
GeoCode Leaderboard Geospatial 3 70.93% pass@1 2026-05-28
GeoRC Geospatial 5 42.3 2026-05-27
HealthBench Healthcare 2 0.4778 2026-05-27
MedQA Healthcare 40 91.183% 2026-04-16
HUMAINE Human Preference 24 3.53 2026-05-06
Multi-IF Instruction Following 15 0.71 2026-05-06
Artificial Analysis Intelligence Index Intelligence 180 26.28 2026-05-11
GPQA Diamond Intelligence 75 65.404% 2026-05-28
Humanity's Last Exam Intelligence 345 4.6% 2026-05-11
MMLU Pro Intelligence 59 80.495% 2026-05-28
MMLU-Pro Intelligence 104 80.6% 2026-05-11
MMMU Pro Intelligence 45 72.386% 2026-05-28
SimpleQA Intelligence 7 41.6% 2026-05-27
AraGen v3 Language 9 74.54 2026-05-06
HellaSwag Language 1 95.30 2026-05-06
HindiGen v1 Language 9 73.37 2026-05-06
WinoGrande Language 3 87.50 2026-05-06
CaseLaw v2 Legal 3 69.882% 2026-05-04
LegalBench Legal 32 83.1% 2026-05-28
LEXam Legal 6 57.50% open / 54.40% MCQ 2026-05-28
Professional Reasoning Bench - Legal Legal 23 36.48 2026-05-06
Graphwalks BFS >128k Long Context 5 0.19 2026-05-06
Graphwalks parents >128k Long Context 4 0.25 2026-05-06
OpenAI-MRCR: 2 needle 128k Long Context 4 0.57 2026-05-06
OpenAI-MRCR: 2 needle 1M Long Context 3 0.46 2026-05-06
Fiction.LiveBench Long Context 8 63.90 2026-05-06
AIME Math 70 39.583% 2026-04-16
AIME 2025 Math 175 34.7% 2026-05-11
IneqMath Math 41 2.50 2026-05-06
JEEBench Math 5 0.292 2026-05-27
MATH 500 Math 33 87.2% 2026-01-09
MGSM Math 59 87.673% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 15 5.52 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 11 0 2026-05-06
HMMT 2025 Mathematics 32 0.29 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 21 38.33 2026-05-06
LiveMedBench Medical 14 0.1379 2026-05-27
MEDIC Benchmark Medical 2 91.71 average normalized public table score 2026-05-27
MedSafe-Dx Medical 5 87.6 2026-05-27
AfroBench-Lite Multilingual 9 65.67 2026-05-06
LanguageBench Multilingual 6 0.66 2026-05-06
CharXiv-D Multimodal 5 0.88 2026-05-06
CharXiv-R Multimodal 26 0.57 2026-05-06
Design Arena Multimodal 99 1084 2026-05-06
IDP Leaderboard Multimodal 18 67.99 2026-05-06
Math-VR Multimodal 18 26.0 2026-05-27
MMLongBench-Doc Multimodal 10 49.70 2026-05-06
MMSI-Bench Multimodal 13 30.9% 2026-05-28
Visual-Language Understanding Multimodal 20 45.34 2026-05-06
VPCT Multimodal 6 45 2026-05-06
VTB Multimodal 11 5.52 2026-05-06
BBH Reasoning 6 75.12 2026-05-06
EnigmaEval Reasoning 26 2.17 2026-05-06
GPQA Diamond Reasoning 236 66.6% 2026-05-11
Graphwalks BFS <128k Reasoning 7 0.62 2026-05-06
Graphwalks parents <128k Reasoning 8 0.58 2026-05-06
Humanity's Last Exam (Text Only) Reasoning 45 4.97 2026-05-06
MultiNRC Reasoning 27 21.23 2026-05-06
SimpleBench Reasoning 12 34.50 2026-05-06
Halluverse-M3 Safety 2 78.66% 2026-05-28
CritPt Science 213 0% 2026-05-11
Defects4J Software Engineering 5 0.452 2026-05-27
RepairBench Software Engineering 6 0.413 2026-05-27
Structured Output Benchmark Structured Output 15 85 2026-05-06
ComplexFuncBench Tool Use 2 0.66 2026-05-06
COLLIE Writing 5 0.66 2026-05-06
Lech Mazur Writing Writing 18 7.56 2026-05-06