ARC-AGI-1

ARC Prize benchmark for few-shot abstract reasoning over grid transformations, using the first ARC-AGI task distribution and semi-private leaderboard evaluation.

148rows
scoreprimary metric
2026-05-05sampled

Metadata

Metrics

Score, Cost/task (lower is better), Total cost (lower is better)

Showing 2 latest source slices.

Latest Results

Scores are stored as percentages. Rows preserve ARC Prize display names because the leaderboard includes base models, reasoning configurations, custom competition systems, and agent systems.

Rank Subject Score Model Match Provenance Sampled
1 Human Panel 98 Imported 2026-05-05
2 Stem Grad 98 Imported 2026-05-05
3 Gemini 3.1 Pro (Preview) 98 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-05
4 GPT-5.5 Pro (High) 96.50 GPT-5.5 Pro
openai-gpt-5.5-pro
Imported 2026-05-05
5 Gemini 3 Deep Think (2/26) 96 Imported 2026-05-05
6 GPT-5.5 (xHigh) 95 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
7 GPT-5.5 Pro (xHigh) 95 GPT-5.5 Pro
openai-gpt-5.5-pro
Imported 2026-05-05
8 GPT-5.2 (Refine.) 94.50 Imported 2026-05-05
9 GPT-5.4 Pro (xHigh) 94.50 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-05
10 GPT-5.5 (High) 94.50 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
11 Claude Opus 4.6 (120K, High) 94 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
12 GPT-5.4 (xHigh) 93.67 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
13 Claude 4.7 (High) 93.50 Imported 2026-05-05
14 Claude Opus 4.6 (120K, Max) 93 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
15 GPT-5.4 (High) 92.67 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
16 Gemini 3.5 Flash (High) 92.50 Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-05
17 GPT-5.5 (Medium) 92.17 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
18 Claude Opus 4.6 (120K, Medium) 92 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
19 Claude 4.7 (Max) 92 Imported 2026-05-05
20 Claude 4.7 (Low) 91 Imported 2026-05-05
21 Claude 4.7 (Medium) 91 Imported 2026-05-05
22 GPT-5.2 Pro (xHigh) 90.50 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-05
23 Grok 4.20 (Reasoning) 89.50 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-05
24 Gemini 3 Deep Think (Preview) ² 87.50 Imported 2026-05-05
25 Claude Sonnet 4.6 (High) 86.50 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-05
26 GPT-5.2 (xHigh) 86.17 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
27 GPT-5.4 (Medium) 86.17 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
28 Claude Opus 4.6 (120K, Low) 86 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
29 Claude Sonnet 4.6 (Max) 86 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-05
30 GPT-5.2 Pro (High) 85.67 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-05
31 Gemini 3 Flash Preview (High) 84.67 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
32 GPT-5.2 Pro (Medium) 81.17 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-05
33 Opus 4.5 (Thinking, 64K) 80 Imported 2026-05-05
34 Grok 4 (Refine.) 79.60 Imported 2026-05-05
35 GPT-5.2 (High) 78.67 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
36 GPT-5.5 (Low) 76.17 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
37 Opus 4.5 (Thinking, 32K) 75.83 Imported 2026-05-05
38 Gemini 3 Pro 75 Gemini 3
google-gemini-3
Imported 2026-05-05
39 GPT-5.1 (Thinking, High) 72.83 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
40 GPT-5.2 (Medium) 72.67 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
41 Opus 4.5 (Thinking, 16K) 72 Imported 2026-05-05
42 GPT-5 Pro 70.17 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-05
43 GPT-5.4 (Low) 68.17 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
44 Grok 4 (Thinking) 66.67 GROK Grok 4
x-ai-grok-4
Imported 2026-05-05
45 GPT-5 (High) 65.67 GPT-5
openai-gpt-5
Imported 2026-05-05
46 Kimi K2.5 65.33 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-05
47 Claude Sonnet 4.5 (Thinking 32K) 63.67 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
48 Minimax M2.5 63.67 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-05
49 GPT-5.4 Mini (xHigh) 63.67 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
50 o3 (High) 60.83 o3
openai-o3
Imported 2026-05-05
51 o3-Pro (High) 59.33 Imported 2026-05-05
52 o4-mini (High) 58.67 o4 Mini
openai-o4-mini
Imported 2026-05-05
53 Opus 4.5 (Thinking, 8K) 58.67 Imported 2026-05-05
54 GPT-5.4 Mini (High) 58 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
55 GPT-5.1 (Thinking, Medium) 57.67 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
56 Gemini 3 Flash Preview (Medium) 57.67 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
57 o3-Pro (Medium) 57 Imported 2026-05-05
58 Deepseek V3.2 57 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-05
59 GPT-5 (Medium) 56.17 GPT-5
openai-gpt-5
Imported 2026-05-05
60 ARChitects 56 Imported 2026-05-05
61 GPT-5.2 (Low) 55.67 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
62 GPT-5 Mini (High) 54.33 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
63 o3 (Medium) 53.83 o3
openai-o3
Imported 2026-05-05
64 GPT-5.4 Nano (xHigh) 51.50 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
65 Gemini 3.5 Flash (Minimal) 48.83 Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-05
66 Grok 4 (Fast Reasoning) 48.50 GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-05
67 Claude Sonnet 4.5 (Thinking 16K) 48.33 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
68 Claude Haiku 4.5 (Thinking 32K) 47.67 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
69 Claude Sonnet 4.5 (Thinking 8K) 46.50 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
70 GLM-5 44.67 GLM GLM 5
z-ai-glm-5
Imported 2026-05-05
71 o3-Pro (Low) 44.33 Imported 2026-05-05
72 GPT-5 (Low) 44 GPT-5
openai-gpt-5
Imported 2026-05-05
73 o4-mini (Medium) 41.83 o4 Mini
openai-o4-mini
Imported 2026-05-05
74 o3 (Low) 41.50 o3
openai-o3
Imported 2026-05-05
75 Gemini 2.5 Pro (Thinking 16K) 41 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
76 GPT-5.4 Mini (Medium) 40.83 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
77 Claude Sonnet 4 (Thinking 16K) 40 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
78 Tiny Recursion Model (TRM) 40 Imported 2026-05-05
79 Opus 4.5 (Thinking, None) 40 Imported 2026-05-05
80 GPT-5.4 Nano (High) 38.17 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
81 GPT-5 Mini (Medium) 37.33 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
82 Claude Haiku 4.5 (Thinking 16K) 37.33 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
83 Gemini 2.5 Pro (Thinking 32K) 37 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
84 Claude Opus 4 (Thinking 16K) 35.67 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
85 o3-mini (High) 34.50 o3 Mini High
openai-o3-mini-high
Imported 2026-05-05
86 Gemini 2.5 Flash (Preview) 33.33 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
87 Gemini 2.5 Flash (Preview) (Thinking 16K) 33.33 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
88 GPT-5.1 (Thinking, Low) 33.17 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
89 GPT-5.4 Nano (Medium) 33 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
90 Gemini 2.5 Flash (Preview) (Thinking 24K) 32.33 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
91 Claude Sonnet 4.5 (Thinking 1K) 31 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
92 Claude Opus 4 (Thinking 8K) 30.67 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
93 Gemini 2.5 Pro (Thinking 8K) 29.50 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
94 Claude Sonnet 4 (Thinking 8K) 29 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
95 Gemini 3 Flash Preview (Low) 29 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
96 Claude 3.7 (16K) 28.60 Imported 2026-05-05
97 Claude Sonnet 4 (Thinking 1K) 28 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
98 Codex Mini (Latest) 27.33 Imported 2026-05-05
99 Claude Opus 4 (Thinking 1K) 27 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
100 GPT-5 Mini (Low) 26.33 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
101 Gemini 2.5 Flash (Preview) (Thinking 8K) 25.83 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
102 Claude Sonnet 4.5 25.50 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
103 Claude Haiku 4.5 (Thinking 8K) 25.50 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
104 Claude Sonnet 4 23.83 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
105 Claude Opus 4 22.50 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
106 o3-mini (Medium) 22.33 o3-mini
openai-o3-mini
Imported 2026-05-05
107 Gemini 3 Flash Preview (Minimal) 21.50 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
108 o4-mini (Low) 21.33 o4 Mini
openai-o4-mini
Imported 2026-05-05
109 Deepseek R1 (05/28) 21.21 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-05
110 Claude 3.7 (8K) 21.20 Imported 2026-05-05
111 GPT-5 Nano (Medium) 20.71 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
112 GPT-5.4 Nano (Low) 18.33 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
113 Icecuber 17 Imported 2026-05-05
114 Claude Haiku 4.5 (Thinking 1K) 16.83 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
115 GPT-5 Nano (High) 16.67 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
116 Grok 3 Mini (Low) 16.50 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-05
117 Gemini 2.5 Flash (Preview) (Thinking 1K) 16 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
118 Gemini 2.5 Pro (Thinking 1K) 16 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
119 Deepseek R1 15.80 R1
deepseek-r1
Imported 2026-05-05
120 o3-mini (Low) 14.50 o3-mini
openai-o3-mini
Imported 2026-05-05
121 Claude Haiku 4.5 14.33 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
122 o1-mini 14 Imported 2026-05-05
123 Claude 3.7 13.60 Imported 2026-05-05
124 GPT-5.4 Mini (Low) 13 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
125 GPT-5.2 12.33 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
126 Claude 3.7 (1K) 11.60 Imported 2026-05-05
127 Qwen3-235b-a22b Instruct (25/07) 11 Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-05
128 GPT-4.5 10.30 GPT-4.5
openai-gpt-4.5-preview
Imported 2026-05-05
129 Magistral Medium (Thinking) 6.12 Imported 2026-05-05
130 GPT-5 (Minimal) 6 GPT-5
openai-gpt-5
Imported 2026-05-05
131 Magistral Medium 5.91 Imported 2026-05-05
132 GPT-5.1 (Thinking, None) 5.83 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
133 GPT-4.1 5.50 GPT-4.1
openai-gpt-4.1
Imported 2026-05-05
134 Grok 3 5.50 GROK Grok 3
xaigrok-3
Imported 2026-05-05
135 GPT-5 Mini (Minimal) 5.33 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
136 Magistral Small 5 Imported 2026-05-05
137 GPT-4o 4.50 GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-05
138 Llama 4 Maverick 4.38 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-05
139 GPT-5 Nano (Low) 4.04 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
140 GPT-4.1-Mini 3.50 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-05
141 GPT-5 Nano (Minimal) 1.50 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
142 Llama 4 Scout 0.50 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-05
143 GPT-4.1-Nano 0 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-05
1 Gemini 3.1 Pro Preview 98% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
2 GPT-5.5 95% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
3 GPT-5.4 Pro 94.5% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
4 GPT-5.4 93.7% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
5 Claude Opus 4.7 93.5% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23