Grok 4
Grok / xAI
70scores
69benchmarks
$3 / $15 per 1M tokenscost in/out
Metadata
Grok Closed/API
Aliases: grok-4, grok-4-07-09, x-ai-grok-4, x-ai-grok-4-07-09, x-ai/grok-4, x-ai/grok-4-07-09
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 22 | 30.30 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 44 | 66.67 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 44 | 15.97 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 9 | 62.97% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 10 | 61.38% | 2026-05-27 |
| Galileo Agent Leaderboard | Agentic | 11 | 0.42 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 31 | 0.48 | 2026-05-11 |
| MCP-Universe | Agentic | 3 | 33.33 | 2026-05-06 |
| MCPMark | Agentic | 10 | 0.32 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 120 | 74.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 46 | 37.9% | 2026-05-11 |
| OpenUGI | Alignment | 3 | 67.83 | 2026-05-06 |
| SpatialBench | Biology | 15 | 31.87% | 2026-05-27 |
| IOI | Coding | 10 | 26.167% | 2026-05-26 |
| LiveCodeBench | Coding | 31 | 83.247% | 2026-05-28 |
| SciCode | Coding | 39 | 45.7% | 2026-05-11 |
| SWE-bench Verified | Coding | 44 | 57.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 45 | 28.09% | 2026-05-28 |
| VibeCodingBench | Coding | 9 | 88 | 2026-05-06 |
| IslamicLegalBench | Domain | 6 | 61.69 | 2026-05-06 |
| SAGE | Education | 52 | 25.101% | 2026-05-28 |
| CorpFin v2 | Finance | 12 | 66.045% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 19 | 53.506% | 2026-05-04 |
| FinanceArena | Finance | 2 | 49.3 | 2026-05-27 |
| MortgageTax | Finance | 66 | 44.475% | 2026-05-28 |
| QuantSightBench | Finance | 2 | 0.7638 coverage | 2026-05-28 |
| TaxEval v2 | Finance | 88 | 65.086% | 2026-05-28 |
| React Native Evals | Frontend Development | 13 | 72.6277% overall | 2026-05-28 |
| MageBench Season 1 | Game | 35 | 1459 rating / 13 games | 2026-05-28 |
| Xent Games | Game | 2 | 63.22 overall | 2026-05-28 |
| BenchLM | General Knowledge | 42 | 65 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 75 | 0.443800 | 2026-05-28 |
| WeirdML | Generalization | 9 | 45.73 | 2026-05-06 |
| MedCode | Healthcare | 35 | 38.078% | 2026-05-28 |
| MedQA | Healthcare | 32 | 92.492% | 2026-04-16 |
| MedScribe | Healthcare | 25 | 78.152% | 2026-05-28 |
| HUMAINE | Human Preference | 5 | 3.72 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 20 | 113 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 66 | 41.52 | 2026-05-11 |
| GPQA Diamond | Intelligence | 16 | 88.132% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 52 | 23.9% | 2026-05-11 |
| MMLU Pro | Intelligence | 34 | 85.304% | 2026-05-28 |
| MMLU-Pro | Intelligence | 15 | 86.6% | 2026-05-11 |
| MMMU Pro | Intelligence | 34 | 76.27% | 2026-05-28 |
| CaseLaw v2 | Legal | 9 | 65.809% | 2026-05-04 |
| LegalBench | Legal | 30 | 83.192% | 2026-05-28 |
| ConStory-Bench | Long Context | 12 | CED 0.67 | 2026-05-28 |
| Fiction.LiveBench | Long Context | 2 | 96.90 | 2026-05-06 |
| AIME | Math | 27 | 90.556% | 2026-04-16 |
| AIME 2024 | Math | 1 | 94.0 | 2026-05-27 |
| AIME 2025 | Math | 16 | 92.7% | 2026-05-11 |
| IneqMath | Math | 21 | 8 | 2026-05-06 |
| MATH 500 | Math | 2 | 96.2% | 2026-01-09 |
| MGSM | Math | 32 | 90.909% | 2026-01-09 |
| FrontierMath 2025-02-28 Private | Mathematics | 7 | 19.66 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 8 | 2.08 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 8 | 84 | 2026-05-06 |
| USAMO25 | Mathematics | 3 | 0.38 | 2026-05-06 |
| Medmarks | Medical | 3 | 0.6342733786197539 | 2026-05-27 |
| Design Arena | Multimodal | 101 | 1075 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 229 | 5.56 | 2026-05-11 |
| ARC-AGI v2 | Reasoning | 12 | 0.16 | 2026-05-06 |
| Balrog | Reasoning | 1 | 43.60 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 22 | 20.8 | 2026-05-27 |
| GPQA Diamond | Reasoning | 28 | 87.7% | 2026-05-11 |
| SimpleBench | Reasoning | 5 | 60.50 | 2026-05-06 |
| CAIS Risk Index | Safety | 15 | 47.2 | 2026-05-27 |
| CritPt | Science | 60 | 2% | 2026-05-11 |
| CAIS Vision Capabilities Index | Vision | 16 | 49.7 | 2026-05-27 |
| Lech Mazur Writing | Writing | 12 | 8.11 | 2026-05-06 |
No matching rows.