ARC-AGI-1
ARC Prize benchmark for few-shot abstract reasoning over grid transformations, using the first ARC-AGI task distribution and semi-private leaderboard evaluation.
148rows
scoreprimary metric
2026-05-05sampled
Metadata
Metrics
Score, Cost/task (lower is better), Total cost (lower is better)
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Panel | 98 | — | Imported | 2026-05-05 |
| 2 | Stem Grad | 98 | — | Imported | 2026-05-05 |
| 3 | Gemini 3.1 Pro (Preview) | 98 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-05 |
| 4 | GPT-5.5 Pro (High) | 96.50 | GPT-5.5 Pro openai-gpt-5.5-pro | Imported | 2026-05-05 |
| 5 | Gemini 3 Deep Think (2/26) | 96 | — | Imported | 2026-05-05 |
| 6 | GPT-5.5 (xHigh) | 95 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 7 | GPT-5.5 Pro (xHigh) | 95 | GPT-5.5 Pro openai-gpt-5.5-pro | Imported | 2026-05-05 |
| 8 | GPT-5.2 (Refine.) | 94.50 | — | Imported | 2026-05-05 |
| 9 | GPT-5.4 Pro (xHigh) | 94.50 | GPT-5.4 Pro openai-gpt-5.4-pro | Imported | 2026-05-05 |
| 10 | GPT-5.5 (High) | 94.50 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 11 | Claude Opus 4.6 (120K, High) | 94 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 12 | GPT-5.4 (xHigh) | 93.67 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 13 | Claude 4.7 (High) | 93.50 | — | Imported | 2026-05-05 |
| 14 | Claude Opus 4.6 (120K, Max) | 93 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 15 | GPT-5.4 (High) | 92.67 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 16 | Gemini 3.5 Flash (High) | 92.50 | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-05 |
| 17 | GPT-5.5 (Medium) | 92.17 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 18 | Claude Opus 4.6 (120K, Medium) | 92 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 19 | Claude 4.7 (Max) | 92 | — | Imported | 2026-05-05 |
| 20 | Claude 4.7 (Low) | 91 | — | Imported | 2026-05-05 |
| 21 | Claude 4.7 (Medium) | 91 | — | Imported | 2026-05-05 |
| 22 | GPT-5.2 Pro (xHigh) | 90.50 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-05 |
| 23 | Grok 4.20 (Reasoning) | 89.50 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-05 |
| 24 | Gemini 3 Deep Think (Preview) ² | 87.50 | — | Imported | 2026-05-05 |
| 25 | Claude Sonnet 4.6 (High) | 86.50 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-05 |
| 26 | GPT-5.2 (xHigh) | 86.17 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 27 | GPT-5.4 (Medium) | 86.17 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 28 | Claude Opus 4.6 (120K, Low) | 86 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 29 | Claude Sonnet 4.6 (Max) | 86 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-05 |
| 30 | GPT-5.2 Pro (High) | 85.67 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-05 |
| 31 | Gemini 3 Flash Preview (High) | 84.67 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 32 | GPT-5.2 Pro (Medium) | 81.17 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-05 |
| 33 | Opus 4.5 (Thinking, 64K) | 80 | — | Imported | 2026-05-05 |
| 34 | Grok 4 (Refine.) | 79.60 | — | Imported | 2026-05-05 |
| 35 | GPT-5.2 (High) | 78.67 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 36 | GPT-5.5 (Low) | 76.17 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 37 | Opus 4.5 (Thinking, 32K) | 75.83 | — | Imported | 2026-05-05 |
| 38 | Gemini 3 Pro | 75 | Gemini 3 google-gemini-3 | Imported | 2026-05-05 |
| 39 | GPT-5.1 (Thinking, High) | 72.83 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 40 | GPT-5.2 (Medium) | 72.67 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 41 | Opus 4.5 (Thinking, 16K) | 72 | — | Imported | 2026-05-05 |
| 42 | GPT-5 Pro | 70.17 | GPT-5 Pro openai-gpt-5-pro | Imported | 2026-05-05 |
| 43 | GPT-5.4 (Low) | 68.17 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 44 | Grok 4 (Thinking) | 66.67 | Grok 4 x-ai-grok-4 | Imported | 2026-05-05 |
| 45 | GPT-5 (High) | 65.67 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 46 | Kimi K2.5 | 65.33 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-05 |
| 47 | Claude Sonnet 4.5 (Thinking 32K) | 63.67 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 48 | Minimax M2.5 | 63.67 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-05 |
| 49 | GPT-5.4 Mini (xHigh) | 63.67 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 50 | o3 (High) | 60.83 | o3 openai-o3 | Imported | 2026-05-05 |
| 51 | o3-Pro (High) | 59.33 | — | Imported | 2026-05-05 |
| 52 | o4-mini (High) | 58.67 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 53 | Opus 4.5 (Thinking, 8K) | 58.67 | — | Imported | 2026-05-05 |
| 54 | GPT-5.4 Mini (High) | 58 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 55 | GPT-5.1 (Thinking, Medium) | 57.67 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 56 | Gemini 3 Flash Preview (Medium) | 57.67 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 57 | o3-Pro (Medium) | 57 | — | Imported | 2026-05-05 |
| 58 | Deepseek V3.2 | 57 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-05 |
| 59 | GPT-5 (Medium) | 56.17 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 60 | ARChitects | 56 | — | Imported | 2026-05-05 |
| 61 | GPT-5.2 (Low) | 55.67 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 62 | GPT-5 Mini (High) | 54.33 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 63 | o3 (Medium) | 53.83 | o3 openai-o3 | Imported | 2026-05-05 |
| 64 | GPT-5.4 Nano (xHigh) | 51.50 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 65 | Gemini 3.5 Flash (Minimal) | 48.83 | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-05 |
| 66 | Grok 4 (Fast Reasoning) | 48.50 | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-05 |
| 67 | Claude Sonnet 4.5 (Thinking 16K) | 48.33 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 68 | Claude Haiku 4.5 (Thinking 32K) | 47.67 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 69 | Claude Sonnet 4.5 (Thinking 8K) | 46.50 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 70 | GLM-5 | 44.67 | GLM 5 z-ai-glm-5 | Imported | 2026-05-05 |
| 71 | o3-Pro (Low) | 44.33 | — | Imported | 2026-05-05 |
| 72 | GPT-5 (Low) | 44 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 73 | o4-mini (Medium) | 41.83 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 74 | o3 (Low) | 41.50 | o3 openai-o3 | Imported | 2026-05-05 |
| 75 | Gemini 2.5 Pro (Thinking 16K) | 41 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 76 | GPT-5.4 Mini (Medium) | 40.83 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 77 | Claude Sonnet 4 (Thinking 16K) | 40 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 78 | Tiny Recursion Model (TRM) | 40 | — | Imported | 2026-05-05 |
| 79 | Opus 4.5 (Thinking, None) | 40 | — | Imported | 2026-05-05 |
| 80 | GPT-5.4 Nano (High) | 38.17 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 81 | GPT-5 Mini (Medium) | 37.33 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 82 | Claude Haiku 4.5 (Thinking 16K) | 37.33 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 83 | Gemini 2.5 Pro (Thinking 32K) | 37 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 84 | Claude Opus 4 (Thinking 16K) | 35.67 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 85 | o3-mini (High) | 34.50 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-05 |
| 86 | Gemini 2.5 Flash (Preview) | 33.33 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 87 | Gemini 2.5 Flash (Preview) (Thinking 16K) | 33.33 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 88 | GPT-5.1 (Thinking, Low) | 33.17 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 89 | GPT-5.4 Nano (Medium) | 33 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 90 | Gemini 2.5 Flash (Preview) (Thinking 24K) | 32.33 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 91 | Claude Sonnet 4.5 (Thinking 1K) | 31 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 92 | Claude Opus 4 (Thinking 8K) | 30.67 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 93 | Gemini 2.5 Pro (Thinking 8K) | 29.50 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 94 | Claude Sonnet 4 (Thinking 8K) | 29 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 95 | Gemini 3 Flash Preview (Low) | 29 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 96 | Claude 3.7 (16K) | 28.60 | — | Imported | 2026-05-05 |
| 97 | Claude Sonnet 4 (Thinking 1K) | 28 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 98 | Codex Mini (Latest) | 27.33 | — | Imported | 2026-05-05 |
| 99 | Claude Opus 4 (Thinking 1K) | 27 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 100 | GPT-5 Mini (Low) | 26.33 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 101 | Gemini 2.5 Flash (Preview) (Thinking 8K) | 25.83 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 102 | Claude Sonnet 4.5 | 25.50 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 103 | Claude Haiku 4.5 (Thinking 8K) | 25.50 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 104 | Claude Sonnet 4 | 23.83 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 105 | Claude Opus 4 | 22.50 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 106 | o3-mini (Medium) | 22.33 | o3-mini openai-o3-mini | Imported | 2026-05-05 |
| 107 | Gemini 3 Flash Preview (Minimal) | 21.50 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 108 | o4-mini (Low) | 21.33 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 109 | Deepseek R1 (05/28) | 21.21 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-05 |
| 110 | Claude 3.7 (8K) | 21.20 | — | Imported | 2026-05-05 |
| 111 | GPT-5 Nano (Medium) | 20.71 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 112 | GPT-5.4 Nano (Low) | 18.33 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 113 | Icecuber | 17 | — | Imported | 2026-05-05 |
| 114 | Claude Haiku 4.5 (Thinking 1K) | 16.83 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 115 | GPT-5 Nano (High) | 16.67 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 116 | Grok 3 Mini (Low) | 16.50 | Grok 3 Mini x-ai-grok-3-mini | Imported | 2026-05-05 |
| 117 | Gemini 2.5 Flash (Preview) (Thinking 1K) | 16 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 118 | Gemini 2.5 Pro (Thinking 1K) | 16 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 119 | Deepseek R1 | 15.80 | R1 deepseek-r1 | Imported | 2026-05-05 |
| 120 | o3-mini (Low) | 14.50 | o3-mini openai-o3-mini | Imported | 2026-05-05 |
| 121 | Claude Haiku 4.5 | 14.33 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 122 | o1-mini | 14 | — | Imported | 2026-05-05 |
| 123 | Claude 3.7 | 13.60 | — | Imported | 2026-05-05 |
| 124 | GPT-5.4 Mini (Low) | 13 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 125 | GPT-5.2 | 12.33 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 126 | Claude 3.7 (1K) | 11.60 | — | Imported | 2026-05-05 |
| 127 | Qwen3-235b-a22b Instruct (25/07) | 11 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-05 |
| 128 | GPT-4.5 | 10.30 | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-05 |
| 129 | Magistral Medium (Thinking) | 6.12 | — | Imported | 2026-05-05 |
| 130 | GPT-5 (Minimal) | 6 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 131 | Magistral Medium | 5.91 | — | Imported | 2026-05-05 |
| 132 | GPT-5.1 (Thinking, None) | 5.83 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 133 | GPT-4.1 | 5.50 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-05 |
| 134 | Grok 3 | 5.50 | Grok 3 xaigrok-3 | Imported | 2026-05-05 |
| 135 | GPT-5 Mini (Minimal) | 5.33 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 136 | Magistral Small | 5 | — | Imported | 2026-05-05 |
| 137 | GPT-4o | 4.50 | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-05 |
| 138 | Llama 4 Maverick | 4.38 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-05 |
| 139 | GPT-5 Nano (Low) | 4.04 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 140 | GPT-4.1-Mini | 3.50 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-05 |
| 141 | GPT-5 Nano (Minimal) | 1.50 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 142 | Llama 4 Scout | 0.50 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-05 |
| 143 | GPT-4.1-Nano | 0 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-05 |
| 1 | Gemini 3.1 Pro Preview | 98% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
| 2 | GPT-5.5 | 95% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 3 | GPT-5.4 Pro | 94.5% | GPT-5.4 Pro openai-gpt-5.4-pro | Launch post | 2026-04-23 |
| 4 | GPT-5.4 | 93.7% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 5 | Claude Opus 4.7 | 93.5% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
No matching rows.