GROK

Grok 4

Grok / xAI

70scores
69benchmarks
$3 / $15 per 1M tokenscost in/out

Metadata

Grok Closed/API

Aliases: grok-4, grok-4-07-09, x-ai-grok-4, x-ai-grok-4-07-09, x-ai/grok-4, x-ai/grok-4-07-09

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 22 30.30 2026-05-06
ARC-AGI-1 Agentic 44 66.67 2026-05-05
ARC-AGI-2 Agentic 44 15.97 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 9 62.97% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 10 61.38% 2026-05-27
Galileo Agent Leaderboard Agentic 11 0.42 2026-05-06
Gert Labs Rankings Agentic 31 0.48 2026-05-11
MCP-Universe Agentic 3 33.33 2026-05-06
MCPMark Agentic 10 0.32 2026-05-06
Tau2-Bench Telecom Agentic 120 74.9% 2026-05-11
Terminal-Bench Hard Agentic 46 37.9% 2026-05-11
OpenUGI Alignment 3 67.83 2026-05-06
SpatialBench Biology 15 31.87% 2026-05-27
IOI Coding 10 26.167% 2026-05-26
LiveCodeBench Coding 31 83.247% 2026-05-28
SciCode Coding 39 45.7% 2026-05-11
SWE-bench Verified Coding 44 57.8% 2026-05-28
Terminal-Bench 2.0 Coding 45 28.09% 2026-05-28
VibeCodingBench Coding 9 88 2026-05-06
IslamicLegalBench Domain 6 61.69 2026-05-06
SAGE Education 52 25.101% 2026-05-28
CorpFin v2 Finance 12 66.045% 2026-05-28
Finance Agent v1.1 Finance 19 53.506% 2026-05-04
FinanceArena Finance 2 49.3 2026-05-27
MortgageTax Finance 66 44.475% 2026-05-28
QuantSightBench Finance 2 0.7638 coverage 2026-05-28
TaxEval v2 Finance 88 65.086% 2026-05-28
React Native Evals Frontend Development 13 72.6277% overall 2026-05-28
MageBench Season 1 Game 35 1459 rating / 13 games 2026-05-28
Xent Games Game 2 63.22 overall 2026-05-28
BenchLM General Knowledge 42 65 2026-05-06
HELM AIR-Bench Generalization 75 0.443800 2026-05-28
WeirdML Generalization 9 45.73 2026-05-06
MedCode Healthcare 35 38.078% 2026-05-28
MedQA Healthcare 32 92.492% 2026-04-16
MedScribe Healthcare 25 78.152% 2026-05-28
HUMAINE Human Preference 5 3.72 2026-05-06
AIIQ Composite IQ Intelligence 20 113 2026-05-12
Artificial Analysis Intelligence Index Intelligence 66 41.52 2026-05-11
GPQA Diamond Intelligence 16 88.132% 2026-05-28
Humanity's Last Exam Intelligence 52 23.9% 2026-05-11
MMLU Pro Intelligence 34 85.304% 2026-05-28
MMLU-Pro Intelligence 15 86.6% 2026-05-11
MMMU Pro Intelligence 34 76.27% 2026-05-28
CaseLaw v2 Legal 9 65.809% 2026-05-04
LegalBench Legal 30 83.192% 2026-05-28
ConStory-Bench Long Context 12 CED 0.67 2026-05-28
Fiction.LiveBench Long Context 2 96.90 2026-05-06
AIME Math 27 90.556% 2026-04-16
AIME 2024 Math 1 94.0 2026-05-27
AIME 2025 Math 16 92.7% 2026-05-11
IneqMath Math 21 8 2026-05-06
MATH 500 Math 2 96.2% 2026-01-09
MGSM Math 32 90.909% 2026-01-09
FrontierMath 2025-02-28 Private Mathematics 7 19.66 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 8 2.08 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 8 84 2026-05-06
USAMO25 Mathematics 3 0.38 2026-05-06
Medmarks Medical 3 0.6342733786197539 2026-05-27
Design Arena Multimodal 101 1075 2026-05-06
Artificial Analysis Openness Index Openness 229 5.56 2026-05-11
ARC-AGI v2 Reasoning 12 0.16 2026-05-06
Balrog Reasoning 1 43.60 2026-05-06
CAIS Text Capabilities Index Reasoning 22 20.8 2026-05-27
GPQA Diamond Reasoning 28 87.7% 2026-05-11
SimpleBench Reasoning 5 60.50 2026-05-06
CAIS Risk Index Safety 15 47.2 2026-05-27
CritPt Science 60 2% 2026-05-11
CAIS Vision Capabilities Index Vision 16 49.7 2026-05-27
Lech Mazur Writing Writing 12 8.11 2026-05-06