GPT-4o (2024-08-06)
GPT / OpenAI
40scores
40benchmarks
$2.5 / $10 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-4o-2024-08-06, openai-gpt-4o-2024-08-06, openai/gpt-4o-2024-08-06
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| Clembench Multimodal v1.6.5 | Agentic | 2 | 80.04 | 2026-05-06 |
| MCP-Universe | Agentic | 27 | 15.58 | 2026-05-06 |
| Tau2 Airline | Agentic | 18 | 0.46 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 248 | 28.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 219 | 8.3% | 2026-05-11 |
| RewardBench | Alignment | 44 | 86.73 | 2026-05-06 |
| LiveCodeBench | Coding | 24 | 29.50 | 2026-05-06 |
| SciCode | Coding | 220 | 33.1% | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 52 | 90.40 | 2026-05-06 |
| CorpFin v2 | Finance | 101 | 39.433% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 47 | 8.064% | 2026-05-04 |
| MortgageTax | Finance | 42 | 60.97% | 2026-05-28 |
| TaxEval v2 | Finance | 59 | 71.136% | 2026-05-28 |
| MedQA | Healthcare | 56 | 88.161% | 2026-04-16 |
| Multi-IF | Instruction Following | 19 | 0.61 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 262 | 18.64 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 474 | 2.9% | 2026-05-11 |
| MMLU Pro | Intelligence | 86 | 74.13% | 2026-05-28 |
| MMMU Pro | Intelligence | 59 | 64.009% | 2026-05-28 |
| HindiGen v1 | Language | 6 | 74.45 | 2026-05-06 |
| LegalBench | Legal | 59 | 80.12% | 2026-05-28 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 8 | 0.32 | 2026-05-06 |
| AIME | Math | 85 | 13.958% | 2026-04-16 |
| MATH 500 | Math | 45 | 75.2% | 2026-01-09 |
| MGSM | Math | 38 | 90.691% | 2026-01-09 |
| BenchBench | Meta | 3 | 0.97 | 2026-05-06 |
| ChartQA | Multimodal | 12 | 0.86 | 2026-05-06 |
| CharXiv-D | Multimodal | 9 | 0.85 | 2026-05-06 |
| CharXiv-R | Multimodal | 24 | 0.59 | 2026-05-06 |
| VideoMMMU | Multimodal | 23 | 0.61 | 2026-05-06 |
| ERQA | Reasoning | 19 | 0.35 | 2026-05-06 |
| GPQA Diamond | Reasoning | 329 | 52.1% | 2026-05-11 |
| Graphwalks BFS <128k | Reasoning | 10 | 0.42 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 10 | 0.35 | 2026-05-06 |
| ZebraLogic | Reasoning | 15 | 31.70 | 2026-05-06 |
| X-Risks Leaderboard | Safety | 4 | 18.92 | 2026-05-06 |
| CritPt | Science | 216 | 0% | 2026-05-11 |
| MaCBench | Science | 6 | 0.54 | 2026-05-06 |
| ComplexFuncBench | Tool Use | 1 | 0.67 | 2026-05-06 |
| COLLIE | Writing | 7 | 0.61 | 2026-05-06 |
No matching rows.