ARC-AGI v2
ARC-AGI-2 is an upgraded benchmark for measuring abstract reasoning and problem-solving abilities in AI systems through visual grid transformation tasks. It evaluates fluid intelligence via input-output grid pairs (1x1 to 30x30) using colored cells (0-9), requiring models to identify underlying transformation rules from demonstration examples and apply them to test cases. Designed to be easy for humans but challenging for AI, focusing on core cognitive abilities like spatial reasoning, pattern recognition, and compositional generalization.
15rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.5 | 0.85 | GPT-5.5 openai-gpt-5.5 | Self-reported | 2026-05-06 |
| 2 | Gemini 3.1 Pro | 0.77 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-06 |
| 3 | GPT-5.4 | 0.73 | GPT-5.4 openai-gpt-5.4 | Self-reported | 2026-05-06 |
| 4 | Claude Opus 4.6 | 0.69 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-06 |
| 5 | Claude Sonnet 4.6 | 0.58 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Self-reported | 2026-05-06 |
| 6 | GPT-5.2 Pro | 0.54 | GPT-5.2 Pro openai-gpt-5.2-pro | Self-reported | 2026-05-06 |
| 7 | GPT-5.2 | 0.53 | GPT-5.2 openai-gpt-5.2 | Self-reported | 2026-05-06 |
| 8 | Muse Spark | 0.42 | — | Self-reported | 2026-05-06 |
| 9 | Claude Opus 4.5 | 0.38 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Self-reported | 2026-05-06 |
| 10 | Gemini 3 Flash | 0.34 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Self-reported | 2026-05-06 |
| 11 | Gemini 3 Pro | 0.31 | Gemini 3 google-gemini-3 | Self-reported | 2026-05-06 |
| 12 | Grok-4 | 0.16 | Grok 4 x-ai-grok-4 | Self-reported | 2026-05-06 |
| 13 | Claude Opus 4 | 0.09 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-06 |
| 14 | o3 | 0.07 | o3 openai-o3 | Imported | 2026-05-06 |
| 15 | Gemini 2.5 Pro | 0.05 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
No matching rows.