GPT-4o (2024-08-06)

GPT / OpenAI

40scores
40benchmarks
$2.5 / $10 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4o-2024-08-06, openai-gpt-4o-2024-08-06, openai/gpt-4o-2024-08-06

Benchmark Results

Benchmark Category Rank Score Sampled
Clembench Multimodal v1.6.5 Agentic 2 80.04 2026-05-06
MCP-Universe Agentic 27 15.58 2026-05-06
Tau2 Airline Agentic 18 0.46 2026-05-06
Tau2-Bench Telecom Agentic 248 28.9% 2026-05-11
Terminal-Bench Hard Agentic 219 8.3% 2026-05-11
RewardBench Alignment 44 86.73 2026-05-06
LiveCodeBench Coding 24 29.50 2026-05-06
SciCode Coding 220 33.1% 2026-05-11
Vectara HHEM Hallucination Leaderboard Factuality 52 90.40 2026-05-06
CorpFin v2 Finance 101 39.433% 2026-05-28
Finance Agent v1.1 Finance 47 8.064% 2026-05-04
MortgageTax Finance 42 60.97% 2026-05-28
TaxEval v2 Finance 59 71.136% 2026-05-28
MedQA Healthcare 56 88.161% 2026-04-16
Multi-IF Instruction Following 19 0.61 2026-05-06
Artificial Analysis Intelligence Index Intelligence 262 18.64 2026-05-11
Humanity's Last Exam Intelligence 474 2.9% 2026-05-11
MMLU Pro Intelligence 86 74.13% 2026-05-28
MMMU Pro Intelligence 59 64.009% 2026-05-28
HindiGen v1 Language 6 74.45 2026-05-06
LegalBench Legal 59 80.12% 2026-05-28
OpenAI-MRCR: 2 needle 128k Long Context 8 0.32 2026-05-06
AIME Math 85 13.958% 2026-04-16
MATH 500 Math 45 75.2% 2026-01-09
MGSM Math 38 90.691% 2026-01-09
BenchBench Meta 3 0.97 2026-05-06
ChartQA Multimodal 12 0.86 2026-05-06
CharXiv-D Multimodal 9 0.85 2026-05-06
CharXiv-R Multimodal 24 0.59 2026-05-06
VideoMMMU Multimodal 23 0.61 2026-05-06
ERQA Reasoning 19 0.35 2026-05-06
GPQA Diamond Reasoning 329 52.1% 2026-05-11
Graphwalks BFS <128k Reasoning 10 0.42 2026-05-06
Graphwalks parents <128k Reasoning 10 0.35 2026-05-06
ZebraLogic Reasoning 15 31.70 2026-05-06
X-Risks Leaderboard Safety 4 18.92 2026-05-06
CritPt Science 216 0% 2026-05-11
MaCBench Science 6 0.54 2026-05-06
ComplexFuncBench Tool Use 1 0.67 2026-05-06
COLLIE Writing 7 0.61 2026-05-06