Grok 4.20
Grok / xAI
90scores
63benchmarks
$1.25 / $2.5 per 1M tokenscost in/out
Metadata
Grok Closed/API
Aliases: grok-4.20, grok-4.20-20260309, x-ai-grok-4.20, x-ai-grok-4.20-20260309, x-ai/grok-4.20, x-ai/grok-4.20-20260309
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents-AA | Agentic | 12 | 14.2% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 23 | 89.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 20 | 65.14 | 2026-05-05 |
| ARC-AGI-3 | Agentic | 6 | 0.09 | 2026-05-05 |
| AutoBench | Agentic | 11 | 3 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 39 | 0.44 | 2026-05-11 |
| HiL-Bench | Agentic | 8 | 8% | 2026-05-05 |
| LMArena Search Arena | Agentic | 11 | 1202.96 | 2026-05-06 |
| PinchBench | Agentic | 32 | 0.83 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 10 | 96.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 40 | 93% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 137 | 69.6% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 162 | 59.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 36 | 40.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 47 | 37.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 137 | 22% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 169 | 16.7% | 2026-05-11 |
| TERMS-Bench | Agentic | 9 | 60.1% SE+ | 2026-05-28 |
| Vending-Bench 2 | Agentic | 14 | 4662.85 | 2026-05-28 |
| WildClawBench | Agentic | 14 | 19.30 | 2026-05-06 |
| OpenUGI | Alignment | 9 | 64.23 | 2026-05-06 |
| OpenUGI | Alignment | 56 | 53.16 | 2026-05-06 |
| OpenUGI | Alignment | 76 | 51.77 | 2026-05-06 |
| OpenUGI | Alignment | 818 | 30.25 | 2026-05-06 |
| scBench | Biology | 11 | 44.44% | 2026-05-27 |
| SpatialBench | Biology | 9 | 45.91% | 2026-05-27 |
| ALE-Bench | Coding | 20 | 1150.28 | 2026-05-06 |
| Arena AI Code | Coding | 32 | 1399 | 2026-05-06 |
| BLXBench | Coding | 6 | 79.10 | 2026-05-06 |
| IOI | Coding | 9 | 30.166% | 2026-05-26 |
| LiveCodeBench | Coding | 23 | 84.265% | 2026-05-28 |
| SciCode | Coding | 40 | 45.6% | 2026-05-11 |
| SciCode | Coding | 44 | 44.7% | 2026-05-11 |
| SciCode | Coding | 226 | 32.8% | 2026-05-11 |
| SciCode | Coding | 229 | 32.2% | 2026-05-11 |
| SWE-bench Verified | Coding | 25 | 72.2% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 31 | 40.449% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 13 | 44.195% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 39 | 4.063% | 2026-05-28 |
| Arena AI Document | Document AI | 18 | 1426 | 2026-05-06 |
| SAGE | Education | 33 | 38.242% | 2026-05-28 |
| CorpFin v2 | Finance | 26 | 63.675% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 24 | 52.295% | 2026-05-04 |
| Finance Agent v2 | Finance | 19 | 28.492% | 2026-05-28 |
| MortgageTax | Finance | 65 | 45.35% | 2026-05-28 |
| TaxEval v2 | Finance | 24 | 74.121% | 2026-05-28 |
| BenchLM | General Knowledge | 38 | 65 | 2026-05-06 |
| LMArena Text Arena | Generalization | 17 | 1455.12 | 2026-05-06 |
| LMArena Text Arena | Generalization | 24 | 1448.62 | 2026-05-06 |
| MedCode | Healthcare | 50 | 32.156% | 2026-05-28 |
| MedQA | Healthcare | 16 | 94.55% | 2026-04-16 |
| MedScribe | Healthcare | 57 | 63.412% | 2026-05-28 |
| PhysicianBench | Healthcare | 12 | 5.3 +/- 3.2 | 2026-05-27 |
| HUMAINE | Human Preference | 17 | 3.60 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 9 | 123 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 25 | 49.33 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 30 | 48.48 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 151 | 29.69 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 156 | 28.99 | 2026-05-11 |
| GPQA Diamond | Intelligence | 15 | 88.636% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 20 | 32.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 25 | 30% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 51 | 24.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 58 | 22.5% | 2026-05-11 |
| LiveBench | Intelligence | 34 | 68.99 | 2026-05-05 |
| MMLU Pro | Intelligence | 25 | 86.254% | 2026-05-28 |
| MMMU Pro | Intelligence | 16 | 83.468% | 2026-05-28 |
| Vals Index | Intelligence | 18 | 39.11% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 15 | 38.704% | 2026-05-28 |
| CaseLaw v2 | Legal | 38 | 54.448% | 2026-05-04 |
| LegalBench | Legal | 74 | 77.738% | 2026-05-28 |
| AIME | Math | 6 | 96.458% | 2026-04-16 |
| ProofBench | Math | 21 | 14% | 2026-05-28 |
| Blueprint-Bench 2 | Multimodal | 13 | 0.170 +/- 0.011 | 2026-05-28 |
| Design Arena | Multimodal | 26 | 1271 | 2026-05-06 |
| Design Arena | Multimodal | 30 | 1248 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 21 | 1255.16 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 12 | 32.5 | 2026-05-27 |
| Context Arena | Reasoning | 49 | 28.75 | 2026-05-06 |
| Context Arena | Reasoning | 68 | 14.49 | 2026-05-06 |
| GPQA Diamond | Reasoning | 8 | 91.1% | 2026-05-11 |
| GPQA Diamond | Reasoning | 24 | 88.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 116 | 78.5% | 2026-05-11 |
| GPQA Diamond | Reasoning | 127 | 77.6% | 2026-05-11 |
| CAIS Risk Index | Safety | 6 | 38.8 | 2026-05-27 |
| CritPt | Science | 28 | 6.6% | 2026-05-11 |
| CritPt | Science | 29 | 6% | 2026-05-11 |
| CritPt | Science | 133 | 0.3% | 2026-05-11 |
| CritPt | Science | 244 | 0% | 2026-05-11 |
| CAIS Vision Capabilities Index | Vision | 10 | 54.6 | 2026-05-27 |
No matching rows.