GPT-4.1 Mini

GPT / OpenAI

72scores
71benchmarks
$0.4 / $1.6 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4.1-mini, gpt-4.1-mini-2025-04-14, openai-gpt-4.1-mini, openai-gpt-4.1-mini-2025-04-14, openai/gpt-4.1-mini, openai/gpt-4.1-mini-2025-04-14

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 140 3.50 2026-05-05
ARC-AGI-2 Agentic 136 0 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 27 50.45% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 67 29.73% 2026-05-27
Galileo Agent Leaderboard Agentic 3 0.56 2026-05-06
Hindsight LLM Memory Leaderboard Agentic 4 86.40 2026-05-06
MCPMark Agentic 38 0.04 2026-05-06
RealDataAgentBench Agentic 2 0.87 2026-04-28
Tau2-Bench Telecom Agentic 172 52.9% 2026-05-11
Terminal-Bench Hard Agentic 227 7.6% 2026-05-11
UAVBench Agentic 6 78.10 2026-05-06
TextClass Benchmark Classification 52 1547.62 2026-05-06
BigCodeBench Coding 8 48.90 2026-05-06
BigCodeBench-Hard Coding 8 31.80 2026-05-05
CadEval Coding 10 16 2026-05-06
LiveCodeBench Coding 80 58.158% 2026-05-28
SciCode Coding 90 40.4% 2026-05-11
GSMA Open Telco Leaderboard Domain 37 58.02 2026-05-06
CorpFin v2 Finance 63 57.926% 2026-05-28
FinanceArena Finance 12 41.9 2026-05-27
FinChain Finance 8 57.24 ChainEval 2026-05-28
MortgageTax Finance 27 65.501% 2026-05-28
PRBench Finance Finance 27 30.45 2026-05-06
TaxEval v2 Finance 48 71.914% 2026-05-28
BenchLM General Knowledge 70 46 2026-05-06
Arena-Hard Generalization 15 46.9% 2026-05-27
HELM AIR-Bench Generalization 56 0.604408 2026-05-28
HELM Safety Generalization 15 0.948914 2026-05-28
WeirdML Generalization 17 37.61 2026-05-06
GeoCode Leaderboard Geospatial 8 66.56% pass@1 2026-05-28
HealthBench Hard Healthcare 22 0.4 2026-05-27
MedQA Healthcare 61 84.633% 2026-04-16
Multi-IF Instruction Following 17 0.67 2026-05-06
Artificial Analysis Intelligence Index Intelligence 218 22.9 2026-05-11
GPQA Diamond Intelligence 73 67.929% 2026-05-28
Humanity's Last Exam Intelligence 346 4.6% 2026-05-11
MMLU Pro Intelligence 78 77.225% 2026-05-28
MMLU-Pro Intelligence 141 78.1% 2026-05-11
MMMU Pro Intelligence 51 70.537% 2026-05-28
SimpleQA Intelligence 17 16.8% 2026-05-27
HindiGen v1 Language 16 65.02 2026-05-06
LegalBench Legal 71 78.044% 2026-05-28
LEXam Legal 13 54.58% open / 48.49% MCQ 2026-05-28
Professional Reasoning Bench - Legal Legal 27 30.38 2026-05-06
Graphwalks BFS >128k Long Context 6 0.15 2026-05-06
Graphwalks parents >128k Long Context 5 0.11 2026-05-06
OpenAI-MRCR: 2 needle 128k Long Context 5 0.47 2026-05-06
OpenAI-MRCR: 2 needle 1M Long Context 4 0.33 2026-05-06
Fiction.LiveBench Long Context 14 46.90 2026-05-06
AIME Math 63 49.375% 2026-04-16
AIME 2025 Math 148 46.3% 2026-05-11
MATH 500 Math 32 88% 2026-01-09
MGSM Math 58 87.782% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 16 4.48 2026-05-06
HMMT 2025 Mathematics 30 0.35 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 20 44.72 2026-05-06
LiveMedBench Medical 21 0.1036 2026-05-27
MEDIC Benchmark Medical 35 65.49 average normalized public table score 2026-05-27
LanguageBench Multilingual 11 0.60 2026-05-06
CharXiv-D Multimodal 4 0.88 2026-05-06
CharXiv-R Multimodal 25 0.57 2026-05-06
Design Arena Multimodal 107 1052 2026-05-06
Math-VR Multimodal 15 33.3 2026-05-27
Visual-Language Understanding Multimodal 39 41.14 2026-05-06
GPQA Diamond Reasoning 238 66.4% 2026-05-11
Graphwalks BFS <128k Reasoning 7 0.62 2026-05-06
Graphwalks parents <128k Reasoning 6 0.60 2026-05-06
LiveSecBench Safety 40 22.99 2026-05-27
CritPt Science 214 0% 2026-05-11
StructEval Structured Output 2 75.64% 2026-05-28
ComplexFuncBench Tool Use 4 0.49 2026-05-06
COLLIE Writing 8 0.55 2026-05-06