GPT-4

GPT / OpenAI

141scores
81benchmarks
$30 / $60 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4, openai-gpt-4, openai/gpt-4

Benchmark Results

Benchmark Category Rank Score Sampled
Clembench Multimodal v1.6.5 Agentic 3 73.55 2026-05-06
MLAgentBench Agentic 4 19.2% 2026-05-27
Nexus Function Calling Agentic 2 54.18 2026-05-06
OmniACT Agentic 2 17.02 2026-05-27
OmniACT Agentic 3 11.6 2026-05-27
ScreenSpot Agentic 4 16.2% 2026-05-27
ToolSandbox Agentic 4 64.3 2026-05-27
RewardBench Alignment 51 84.34 2026-05-06
TextClass Benchmark Classification 10 1747.59 2026-05-06
Aider Refactoring Benchmark Coding 6 50.60 2026-05-06
Aider Refactoring Benchmark Coding 11 33.70 2026-05-06
BigCodeBench Coding 22 46 2026-05-06
ClassEval Coding 1 37.6 2026-05-27
ClassEval Coding 3 29.6 2026-05-27
ClassEval Coding 4 26.2 2026-05-27
CodeEditorBench Coding 1 0.882 2026-05-27
CodeEditorBench Coding 2 0.868 2026-05-27
CodeEditorBench Coding 3 0.855 2026-05-27
CodeEditorBench Coding 5 0.85 2026-05-27
CodeEditorBench Coding 6 0.816 2026-05-27
CodeEditorBench Coding 10 0.8 2026-05-27
CRUXEval Coding 3 76.30 2026-05-05
CRUXEval Coding 5 69.25 2026-05-05
DS-1000 Coding 2 0.51 2026-05-27
ENAMEL Coding 4 0.45 2026-05-06
HumanEval+ Coding 14 79.30 2026-05-05
Spider Data 2 86.60 2026-05-06
Spider Data 3 86.20 2026-05-06
Spider Data 4 85.60 2026-05-06
Spider Data 6 83.90 2026-05-06
Spider Data 8 80.80 2026-05-06
MMDocBench Document Understanding 8 61.93% 2026-05-27
GSMA Open Telco Leaderboard Domain 48 48.58 2026-05-06
FinanceBench Finance 1 89.33 2026-05-06
FinanceBench Finance 2 85.33 2026-05-06
FinanceBench Finance 3 84 2026-05-06
FinanceBench Finance 4 78.67 2026-05-06
FinanceBench Finance 5 78.67 2026-05-06
FinanceBench Finance 7 50 2026-05-06
FinanceBench Finance 8 42 2026-05-06
FinanceBench Finance 11 24.67 2026-05-06
FinanceBench Finance 12 19.33 2026-05-06
FinanceBench Finance 14 16.67 2026-05-06
FinanceBench Finance 15 9.33 2026-05-06
FinanceBench Finance 16 4.67 2026-05-06
FinBen Finance 1 28.19% 2026-05-27
INVESTORBENCH Finance 2 43.696% 2026-05-27
Open FinLLM Leaderboard Finance 2 48.337138% 2026-05-27
AlpacaEval Generalization 7 89.85849210429464 2026-05-27
AlpacaEval Generalization 16 86.51018625518144 2026-05-27
AlpacaEval Generalization 19 85.334647371383 2026-05-27
AlpacaEval Generalization 31 81.38159399734118 2026-05-27
AlpacaEval Generalization 90 44.09937888 2026-05-27
CyberBench Generalization 1 69.6% 2026-05-28
CyberSecEval Generalization 2 19.87% 2026-05-27
EQ-Bench Generalization 3 84.79 2026-05-06
FreshQA Generalization 1 46.4% 2026-05-27
HELM AIR-Bench Generalization 49 0.641728 2026-05-28
InfiniteBench Generalization 1 46.099167% 2026-05-27
L-Eval Generalization 1 73.111667% 2026-05-27
MoralChoice Generalization 4 1 2026-05-27
MT-Bench Generalization 1 8.990625 2026-05-27
MT-Bench Generalization 22 5.4125 2026-05-27
WildBench Generalization 11 7.6640625 2026-05-27
AgentClinic Healthcare 2 51.6% 2026-05-27
MMLU Medical Genetics Healthcare 2 91.0% 2026-05-27
MMLU Professional Medicine Healthcare 2 93.01% 2026-05-27
MultiMedQA Healthcare 2 81.134167% 2026-05-27
HREF Instruction Following 26 6.12 2026-05-06
RubricEval Instruction Following 1 3.18 2026-05-06
URIAL Bench Instruction Following 1 8.99 2026-05-06
AIR-Bench Intelligence 3 53.5889 2026-05-27
Artificial Analysis Intelligence Index Intelligence 371 12.75 2026-05-11
C-Eval Intelligence 67 68.7% 2026-05-27
ChartBench Intelligence 2 54.39 2026-05-06
Gaokao-Bench Intelligence 1 72.2% 2026-05-27
Gaokao-Bench Intelligence 2 71.6% 2026-05-27
HELM Instruct Intelligence 3 0.611111 2026-05-28
HELM Lite Intelligence 3 0.908908 2026-05-28
MathVision Intelligence 124 23.98 2026-05-06
MathVision Intelligence 128 22.76 2026-05-06
MathVision Intelligence 152 13.10 2026-05-06
MathVista Intelligence 31 58.10 2026-05-06
MathVista Intelligence 38 49.90 2026-05-06
MathVista Intelligence 61 33.90 2026-05-06
MathVista Intelligence 64 33.20 2026-05-06
MMBench-CN Intelligence 3 73.3 2026-05-27
MMStar Intelligence 1 57.10 2026-05-06
MMStar Intelligence 4 46.10 2026-05-06
MVBench Intelligence 3 43.5 2026-05-27
OCRBench Intelligence 12 645 2026-05-06
SEED-Bench Intelligence 4 67.30 2026-05-06
SEED-Bench-2 Intelligence 4 69.80 2026-05-06
VCR Intelligence 2 81.6% 2026-05-27
Open Ko-LLM Leaderboard Language 296 40.27 2026-05-06
Open Ko-LLM Leaderboard Language 344 39.38 2026-05-06
LawBench Legal 2 53.8453 2026-05-27
LawBench Legal 3 52.3521 2026-05-27
JEEBench Math 1 0.389 2026-05-27
JEEBench Math 2 0.350 2026-05-27
JEEBench Math 3 0.339 2026-05-27
JEEBench Math 4 0.309 2026-05-27
LeanDojo Benchmark Math 3 7.4% 2026-05-27
OlympiadBench Math 2 17.97 2026-05-06
OlympiadBench Math 2 29.93 2026-05-06
OlympiadBench Math 3 29.07 2026-05-06
Open Medical-LLM Leaderboard Medical 4 82.97 2026-05-06
ReXrank Medical 115 0.708 2026-05-27
ReXrank Medical 123 0.683 2026-05-27
ReXrank Medical 136 0.629 2026-05-27
ReXrank Medical 142 0.605 2026-05-27
ReXrank Medical 148 0.568 2026-05-27
ReXrank Medical 149 0.558 2026-05-27
ReXrank Medical 152 0.549 2026-05-27
ReXrank Medical 164 0.431 2026-05-27
BenchBench Meta 27 0.76 2026-05-06
AutoEval-Video Multimodal 1 22.20 2026-05-06
MMAU Multimodal 21 51.03 2026-05-06
ScienceQA Multimodal 8 92.53 2026-05-06
ScienceQA Multimodal 26 86.54 2026-05-06
ScienceQA Multimodal 33 83.99 2026-05-06
Video-MME Multimodal 31 63.30 2026-05-06
DROP Reasoning 10 0.81 2026-05-06
YALL Nous Leaderboard Reasoning 144 45.66 2026-05-06
ChatRAG Bench Retrieval 5 53.90 2026-05-06
ChemBench Science 47 0.41 2026-05-06
SWT-Bench Software Engineering 20 18.5% 2026-05-27
SWT-Bench Software Engineering 23 15.9% 2026-05-27
SWT-Bench Software Engineering 25 14.1% 2026-05-27
SWT-Bench Software Engineering 26 12.7% 2026-05-27
SWT-Bench Software Engineering 29 9.4% 2026-05-27
SWT-Bench Software Engineering 30 9.1% 2026-05-27
SWT-Bench Software Engineering 31 3.6% 2026-05-27
AudioMC Speech 11 14.82 2026-05-07
AudioMC Speech 14 13.05 2026-05-07
AudioMC - Audio Output Speech 5 13.05 2026-05-07
AudioMC - Text Output Speech 9 14.82 2026-05-06
VoiceBench Speech 7 82.84 2026-05-27
SheetCopilot Benchmark Spreadsheets 5 65.0% 2026-05-27
VNTL Leaderboard Translation 12 69.28 2026-05-06
CG-Bench Video 10 24.9% open-ended acc. / 32.6% MCQ long acc. 2026-05-28