GPT-4.1 Mini
GPT / OpenAI
72scores
71benchmarks
$0.4 / $1.6 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-4.1-mini, gpt-4.1-mini-2025-04-14, openai-gpt-4.1-mini, openai-gpt-4.1-mini-2025-04-14, openai/gpt-4.1-mini, openai/gpt-4.1-mini-2025-04-14
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 140 | 3.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 136 | 0 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 27 | 50.45% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 67 | 29.73% | 2026-05-27 |
| Galileo Agent Leaderboard | Agentic | 3 | 0.56 | 2026-05-06 |
| Hindsight LLM Memory Leaderboard | Agentic | 4 | 86.40 | 2026-05-06 |
| MCPMark | Agentic | 38 | 0.04 | 2026-05-06 |
| RealDataAgentBench | Agentic | 2 | 0.87 | 2026-04-28 |
| Tau2-Bench Telecom | Agentic | 172 | 52.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 227 | 7.6% | 2026-05-11 |
| UAVBench | Agentic | 6 | 78.10 | 2026-05-06 |
| TextClass Benchmark | Classification | 52 | 1547.62 | 2026-05-06 |
| BigCodeBench | Coding | 8 | 48.90 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 8 | 31.80 | 2026-05-05 |
| CadEval | Coding | 10 | 16 | 2026-05-06 |
| LiveCodeBench | Coding | 80 | 58.158% | 2026-05-28 |
| SciCode | Coding | 90 | 40.4% | 2026-05-11 |
| GSMA Open Telco Leaderboard | Domain | 37 | 58.02 | 2026-05-06 |
| CorpFin v2 | Finance | 63 | 57.926% | 2026-05-28 |
| FinanceArena | Finance | 12 | 41.9 | 2026-05-27 |
| FinChain | Finance | 8 | 57.24 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 27 | 65.501% | 2026-05-28 |
| PRBench Finance | Finance | 27 | 30.45 | 2026-05-06 |
| TaxEval v2 | Finance | 48 | 71.914% | 2026-05-28 |
| BenchLM | General Knowledge | 70 | 46 | 2026-05-06 |
| Arena-Hard | Generalization | 15 | 46.9% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 56 | 0.604408 | 2026-05-28 |
| HELM Safety | Generalization | 15 | 0.948914 | 2026-05-28 |
| WeirdML | Generalization | 17 | 37.61 | 2026-05-06 |
| GeoCode Leaderboard | Geospatial | 8 | 66.56% pass@1 | 2026-05-28 |
| HealthBench Hard | Healthcare | 22 | 0.4 | 2026-05-27 |
| MedQA | Healthcare | 61 | 84.633% | 2026-04-16 |
| Multi-IF | Instruction Following | 17 | 0.67 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 218 | 22.9 | 2026-05-11 |
| GPQA Diamond | Intelligence | 73 | 67.929% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 346 | 4.6% | 2026-05-11 |
| MMLU Pro | Intelligence | 78 | 77.225% | 2026-05-28 |
| MMLU-Pro | Intelligence | 141 | 78.1% | 2026-05-11 |
| MMMU Pro | Intelligence | 51 | 70.537% | 2026-05-28 |
| SimpleQA | Intelligence | 17 | 16.8% | 2026-05-27 |
| HindiGen v1 | Language | 16 | 65.02 | 2026-05-06 |
| LegalBench | Legal | 71 | 78.044% | 2026-05-28 |
| LEXam | Legal | 13 | 54.58% open / 48.49% MCQ | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 27 | 30.38 | 2026-05-06 |
| Graphwalks BFS >128k | Long Context | 6 | 0.15 | 2026-05-06 |
| Graphwalks parents >128k | Long Context | 5 | 0.11 | 2026-05-06 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 5 | 0.47 | 2026-05-06 |
| OpenAI-MRCR: 2 needle 1M | Long Context | 4 | 0.33 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 14 | 46.90 | 2026-05-06 |
| AIME | Math | 63 | 49.375% | 2026-04-16 |
| AIME 2025 | Math | 148 | 46.3% | 2026-05-11 |
| MATH 500 | Math | 32 | 88% | 2026-01-09 |
| MGSM | Math | 58 | 87.782% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 16 | 4.48 | 2026-05-06 |
| HMMT 2025 | Mathematics | 30 | 0.35 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 20 | 44.72 | 2026-05-06 |
| LiveMedBench | Medical | 21 | 0.1036 | 2026-05-27 |
| MEDIC Benchmark | Medical | 35 | 65.49 average normalized public table score | 2026-05-27 |
| LanguageBench | Multilingual | 11 | 0.60 | 2026-05-06 |
| CharXiv-D | Multimodal | 4 | 0.88 | 2026-05-06 |
| CharXiv-R | Multimodal | 25 | 0.57 | 2026-05-06 |
| Design Arena | Multimodal | 107 | 1052 | 2026-05-06 |
| Math-VR | Multimodal | 15 | 33.3 | 2026-05-27 |
| Visual-Language Understanding | Multimodal | 39 | 41.14 | 2026-05-06 |
| GPQA Diamond | Reasoning | 238 | 66.4% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 7 | 0.62 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 6 | 0.60 | 2026-05-06 |
| LiveSecBench | Safety | 40 | 22.99 | 2026-05-27 |
| CritPt | Science | 214 | 0% | 2026-05-11 |
| StructEval | Structured Output | 2 | 75.64% | 2026-05-28 |
| ComplexFuncBench | Tool Use | 4 | 0.49 | 2026-05-06 |
| COLLIE | Writing | 8 | 0.55 | 2026-05-06 |
No matching rows.