GPT-4o-mini

GPT / OpenAI

76scores
55benchmarks
$0.15 / $0.6 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4o-mini, openai-gpt-4o-mini, openai/gpt-4o-mini

Benchmark Results

Benchmark Category Rank Score Sampled
Hindsight LLM Memory Leaderboard Agentic 17 81 2026-05-06
PinchBench Agentic 50 0.75 2026-05-06
RealDataAgentBench Agentic 9 0.78 2026-04-28
TERMS-Bench Agentic 15 18.9% SE+ 2026-05-28
Speech Arena Audio 2 1593 2026-05-06
TextClass Benchmark Classification 25 1674.64 2026-05-06
EvalPlus Coding 9 77.85 2026-05-05
HumanEval+ Coding 8 83.50 2026-05-05
MBPP+ Coding 12 72.20 2026-05-05
Natural Language to Mongosh Coding 39 0.83 2026-05-06
Natural Language to Mongosh Coding 42 0.83 2026-05-06
Natural Language to Mongosh Coding 56 0.81 2026-05-06
Natural Language to Mongosh Coding 57 0.81 2026-05-06
Natural Language to Mongosh Coding 62 0.80 2026-05-06
Natural Language to Mongosh Coding 72 0.79 2026-05-06
Natural Language to Mongosh Coding 74 0.79 2026-05-06
Natural Language to Mongosh Coding 78 0.79 2026-05-06
Natural Language to Mongosh Coding 81 0.78 2026-05-06
Natural Language to Mongosh Coding 83 0.78 2026-05-06
Natural Language to Mongosh Coding 87 0.77 2026-05-06
Natural Language to Mongosh Coding 88 0.77 2026-05-06
Natural Language to Mongosh Coding 90 0.77 2026-05-06
Natural Language to Mongosh Coding 93 0.75 2026-05-06
Natural Language to Mongosh Coding 94 0.75 2026-05-06
Natural Language to Mongosh Coding 96 0.74 2026-05-06
Natural Language to Mongosh Coding 97 0.74 2026-05-06
Natural Language to Mongosh Coding 100 0.73 2026-05-06
SciCode Coding 345 22.9% 2026-05-11
MMTU Data 21 0.40 2026-05-06
GSMA Open Telco Leaderboard Domain 42 51.25 2026-05-06
RoboBench Embodied 11 34.40 2026-05-27
FinEval Finance 24 66.2 2026-05-27
Open FinLLM Leaderboard Finance 11 28.32187% 2026-05-27
SECQUE Finance 3 0.64 2026-05-28
MageBench Season 1 Game 28 1546 rating / 4 games 2026-05-28
BenchLM General Knowledge 63 50 2026-05-06
MixEval Chat General Knowledge 18 51.60 2026-05-06
AgentHarm Generalization 28 62.5% 2026-05-27
AgentHarm Generalization 30 68.4% 2026-05-27
AgentHarm Generalization 32 68.8% 2026-05-27
HELM AIR-Bench Generalization 62 0.562610 2026-05-28
HELM Safety Generalization 24 0.930425 2026-05-28
LongBench v2 Generalization 33 32.4% 2026-05-27
WildBench Generalization 3 7.86328125 2026-05-27
CHOICE Geospatial 12 0.6133 2026-05-27
GeoCode Leaderboard Geospatial 18 55.02% pass@1 2026-05-28
HealthBench Hard Healthcare 31 0.33 2026-05-27
HELM MedQA Healthcare 13 0.749503 2026-05-28
MedAgentBench Healthcare 5 56.33% 2026-05-27
Artificial Analysis Intelligence Index Intelligence 377 12.65 2026-05-11
HELM Lite Intelligence 14 0.756818 2026-05-28
Humanity's Last Exam Intelligence 412 4% 2026-05-11
MMLU-Pro Intelligence 253 64.8% 2026-05-11
SimpleQA Intelligence 21 9.5% 2026-05-27
OpenHuEval Language 6 49.33 2026-05-06
LEXam Legal 23 42.55% open / 40.96% MCQ 2026-05-28
AIME 2025 Math 219 14.7% 2026-05-11
IneqMath Math 46 2 2026-05-06
MedHELM Medical 7 0.39285714285714285 2026-05-27
MEDIC Benchmark Medical 95 19 average normalized public table score 2026-05-27
MedSafe-Dx Medical 4 90.4 2026-05-27
LanguageBench Multilingual 13 0.55 2026-05-06
MMMU-Pro Multimodal 50 37.60 2026-05-06
Video-MME Multimodal 24 68.90 2026-05-06
GPQA Diamond Reasoning 382 42.6% 2026-05-11
AgentLeak Safety 2 76.30 2026-05-06
Halluverse-M3 Safety 6 73.39% 2026-05-28
ChemBench Science 28 0.50 2026-05-06
SWE-PRBench Software Engineering 6 0.108 2026-05-27
SWT-Bench Software Engineering 28 9.8% 2026-05-27
JSONSchemaBench Structured Output 2 95.8% schema compliance 2026-05-28
JSONSchemaBench Structured Output 13 86.2% schema compliance 2026-05-28
JSONSchemaBench Structured Output 24 68.5% schema compliance 2026-05-28
StructEval Structured Output 4 73.19% 2026-05-28
Generate README Eval Summarization 11 32.16 2026-05-06
VNTL Leaderboard Translation 6 72.23 2026-05-06