GPT-4 Turbo
GPT / OpenAI
48scores
43benchmarks
$10 / $30 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-4-turbo, openai-gpt-4-turbo, openai/gpt-4-turbo
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| LLM Game Benchmark | Agentic | 5 | 0.35 | 2026-05-06 |
| MLAgentBench | Agentic | 3 | 26.0% | 2026-05-27 |
| RewardBench | Alignment | 58 | 83.95 | 2026-05-06 |
| TextClass Benchmark | Classification | 5 | 1781.47 | 2026-05-06 |
| Aider Refactoring Benchmark | Coding | 10 | 34.10 | 2026-05-06 |
| Aider Refactoring Benchmark | Coding | 14 | 21.40 | 2026-05-06 |
| BigCodeBench | Coding | 11 | 48.20 | 2026-05-06 |
| BigCodeBench-Hard | Coding | 17 | 29.10 | 2026-05-05 |
| CRUXEval | Coding | 1 | 78.85 | 2026-05-05 |
| CRUXEval | Coding | 6 | 68.10 | 2026-05-05 |
| DS-1000 | Coding | 1 | 0.539 | 2026-05-27 |
| ENAMEL | Coding | 2 | 0.47 | 2026-05-06 |
| EvalPlus | Coding | 10 | 77.50 | 2026-05-05 |
| HumanEval+ | Coding | 6 | 86.60 | 2026-05-05 |
| HumanEval+ | Coding | 11 | 81.70 | 2026-05-05 |
| LiveCodeBench | Coding | 25 | 28.70 | 2026-05-06 |
| MBPP+ | Coding | 8 | 73.30 | 2026-05-05 |
| McEval | Coding | 2 | 63.4% | 2026-05-27 |
| SciCode | Coding | 231 | 31.9% | 2026-05-11 |
| K-12EduBench | Education | 18 | 55.94 | 2026-05-27 |
| BenchLM | General Knowledge | 98 | 26 | 2026-05-06 |
| MixEval Chat | General Knowledge | 6 | 62.60 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 34 | 0.718739 | 2026-05-28 |
| HELM Safety | Generalization | 13 | 0.960619 | 2026-05-28 |
| WeirdML | Generalization | 27 | 18.01 | 2026-05-06 |
| WildBench | Generalization | 4 | 7.804496578690127 | 2026-05-27 |
| MedQA | Healthcare | 69 | 81.986% | 2026-04-16 |
| RubricEval | Instruction Following | 2 | 3.10 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 46 | 76 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 355 | 13.72 | 2026-05-11 |
| HELM Lite | Intelligence | 4 | 0.898402 | 2026-05-28 |
| HELM Lite | Intelligence | 16 | 0.745371 | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 467 | 3.3% | 2026-05-11 |
| MathVision | Intelligence | 101 | 30.26 | 2026-05-06 |
| MMLU-Pro | Intelligence | 227 | 69.4% | 2026-05-11 |
| SimpleQA | Intelligence | 12 | 24.2% | 2026-05-27 |
| TableBench | Intelligence | 15 | 51.5% | 2026-05-27 |
| LegalBench | Legal | 53 | 80.462% | 2026-05-28 |
| OTIS Mock AIME 2024-2025 | Mathematics | 32 | 6.67 | 2026-05-06 |
| BenchBench | Meta | 7 | 0.91 | 2026-05-06 |
| DROP | Reasoning | 4 | 0.86 | 2026-05-06 |
| NPHardEval | Reasoning | 1 | 0.38 | 2026-05-06 |
| SimpleBench | Reasoning | 19 | 25.10 | 2026-05-06 |
| ZebraLogic | Reasoning | 20 | 28.40 | 2026-05-06 |
| ChatRAG Bench | Retrieval | 4 | 54.03 | 2026-05-06 |
| SciKnowEval | Science | 4 | 4 | 2026-05-27 |
| DevBench | Software Engineering | 1 | 56.1636 | 2026-05-27 |
| DevBench | Software Engineering | 2 | 54.1273 | 2026-05-27 |
No matching rows.