ARC-AGI-3
Interactive ARC-AGI benchmark variant that evaluates agents adapting to novel grid-based environments, with an official public leaderboard.
6rows
scoreprimary metric
2026-05-05sampled
Metadata
Metrics
Score, Cost/task (lower is better), Total cost (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Anthropic Opus 4.6 (Max) | 0.51 | — | Imported | 2026-05-05 |
| 2 | GPT-5.5 (High) | 0.43 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 3 | Gemini 3.1 Pro (Preview) | 0.42 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-05 |
| 4 | GPT-5.4 (High) | 0.21 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 5 | Opus 4.7 (High) | 0.18 | — | Imported | 2026-05-05 |
| 6 | Grok 4.20 (Beta Reasoning) | 0.09 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-05 |
No matching rows.