GPT-4 Turbo

GPT / OpenAI

48scores
43benchmarks
$10 / $30 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-4-turbo, openai-gpt-4-turbo, openai/gpt-4-turbo

Benchmark Results

Benchmark Category Rank Score Sampled
LLM Game Benchmark Agentic 5 0.35 2026-05-06
MLAgentBench Agentic 3 26.0% 2026-05-27
RewardBench Alignment 58 83.95 2026-05-06
TextClass Benchmark Classification 5 1781.47 2026-05-06
Aider Refactoring Benchmark Coding 10 34.10 2026-05-06
Aider Refactoring Benchmark Coding 14 21.40 2026-05-06
BigCodeBench Coding 11 48.20 2026-05-06
BigCodeBench-Hard Coding 17 29.10 2026-05-05
CRUXEval Coding 1 78.85 2026-05-05
CRUXEval Coding 6 68.10 2026-05-05
DS-1000 Coding 1 0.539 2026-05-27
ENAMEL Coding 2 0.47 2026-05-06
EvalPlus Coding 10 77.50 2026-05-05
HumanEval+ Coding 6 86.60 2026-05-05
HumanEval+ Coding 11 81.70 2026-05-05
LiveCodeBench Coding 25 28.70 2026-05-06
MBPP+ Coding 8 73.30 2026-05-05
McEval Coding 2 63.4% 2026-05-27
SciCode Coding 231 31.9% 2026-05-11
K-12EduBench Education 18 55.94 2026-05-27
BenchLM General Knowledge 98 26 2026-05-06
MixEval Chat General Knowledge 6 62.60 2026-05-06
HELM AIR-Bench Generalization 34 0.718739 2026-05-28
HELM Safety Generalization 13 0.960619 2026-05-28
WeirdML Generalization 27 18.01 2026-05-06
WildBench Generalization 4 7.804496578690127 2026-05-27
MedQA Healthcare 69 81.986% 2026-04-16
RubricEval Instruction Following 2 3.10 2026-05-06
AIIQ Composite IQ Intelligence 46 76 2026-05-12
Artificial Analysis Intelligence Index Intelligence 355 13.72 2026-05-11
HELM Lite Intelligence 4 0.898402 2026-05-28
HELM Lite Intelligence 16 0.745371 2026-05-28
Humanity's Last Exam Intelligence 467 3.3% 2026-05-11
MathVision Intelligence 101 30.26 2026-05-06
MMLU-Pro Intelligence 227 69.4% 2026-05-11
SimpleQA Intelligence 12 24.2% 2026-05-27
TableBench Intelligence 15 51.5% 2026-05-27
LegalBench Legal 53 80.462% 2026-05-28
OTIS Mock AIME 2024-2025 Mathematics 32 6.67 2026-05-06
BenchBench Meta 7 0.91 2026-05-06
DROP Reasoning 4 0.86 2026-05-06
NPHardEval Reasoning 1 0.38 2026-05-06
SimpleBench Reasoning 19 25.10 2026-05-06
ZebraLogic Reasoning 20 28.40 2026-05-06
ChatRAG Bench Retrieval 4 54.03 2026-05-06
SciKnowEval Science 4 4 2026-05-27
DevBench Software Engineering 1 56.1636 2026-05-27
DevBench Software Engineering 2 54.1273 2026-05-27