ARC-AGI-2

Second ARC-AGI benchmark variant with a harder grid-reasoning task distribution and semi-private leaderboard evaluation.

151rows
scoreprimary metric
2026-05-05sampled

Metadata

Metrics

Score, Cost/task (lower is better), Total cost (lower is better)

Showing 2 latest source slices.

Latest Results

Scores are stored as percentages. Rows preserve ARC Prize display names because the leaderboard includes base models, reasoning configurations, custom competition systems, and agent systems.

Rank Subject Score Model Match Provenance Sampled
1 Human Panel 100 Imported 2026-05-05
2 GPT-5.5 (xHigh) 85 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
3 Gemini 3 Deep Think (2/26) 84.58 Imported 2026-05-05
4 GPT-5.5 Pro (High) 84.58 GPT-5.5 Pro
openai-gpt-5.5-pro
Imported 2026-05-05
5 GPT-5.5 Pro (xHigh) 84.16 GPT-5.5 Pro
openai-gpt-5.5-pro
Imported 2026-05-05
6 GPT-5.4 Pro (xHigh) 83.33 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-05
7 GPT-5.5 (High) 83.33 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
8 Gemini 3.1 Pro (Preview) 77.08 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-05
9 Claude 4.7 (Max) 75.83 Imported 2026-05-05
10 GPT-5.4 (xHigh) 73.95 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
11 GPT-5.2 (Refine.) 72.90 Imported 2026-05-05
12 Gemini 3.5 Flash (High) 72.08 Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-05
13 GPT-5.5 (Medium) 70.42 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
14 Claude Opus 4.6 (120K, High) 69.17 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
15 Claude Opus 4.6 (120K, Max) 68.75 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
16 Claude 4.7 (High) 68.33 Imported 2026-05-05
17 GPT-5.4 (High) 67.50 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
18 Claude 4.7 (Medium) 67.50 Imported 2026-05-05
19 Claude Opus 4.6 (120K, Medium) 66.25 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
20 Grok 4.20 (Reasoning) 65.14 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-05
21 Claude Opus 4.6 (120K, Low) 64.58 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-05
22 Claude 4.7 (Low) 62.08 Imported 2026-05-05
23 Claude Sonnet 4.6 (High) 60.42 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-05
24 Claude Sonnet 4.6 (Max) 58.33 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-05
25 GPT-5.4 (Medium) 55.42 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
26 GPT-5.2 Pro (High) 54.16 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-05
27 Gemini 3 Pro (Refine.) 54 Imported 2026-05-05
28 GPT-5.2 (xHigh) 52.91 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
29 Gemini 3 Deep Think (Preview) ² 45.14 Imported 2026-05-05
30 GPT-5.2 (High) 43.33 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
31 GPT-5.2 Pro (Medium) 38.47 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-05
32 Opus 4.5 (Thinking, 64K) 37.64 Imported 2026-05-05
33 Gemini 3 Flash Preview (High) 33.61 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
34 GPT-5.5 (Low) 33.33 GPT-5.5
openai-gpt-5.5
Imported 2026-05-05
35 Gemini 3 Pro 31.11 Gemini 3
google-gemini-3
Imported 2026-05-05
36 Grok 4 (Refine.) 29.44 Imported 2026-05-05
37 GPT-5.4 (Low) 29.17 GPT-5.4
openai-gpt-5.4
Imported 2026-05-05
38 NVARC 27.64 Imported 2026-05-05
39 GPT-5.2 (Medium) 26.67 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
40 Opus 4.5 (Thinking, 16K) 22.78 Imported 2026-05-05
41 GPT-5.4 Mini (xHigh) 18.90 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
42 GPT-5 Pro 18.33 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-05
43 GPT-5.1 (Thinking, High) 17.64 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
44 Grok 4 (Thinking) 15.97 GROK Grok 4
x-ai-grok-4
Imported 2026-05-05
45 Opus 4.5 (Thinking, 8K) 13.89 Imported 2026-05-05
46 Claude Sonnet 4.5 (Thinking 32K) 13.61 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
47 GPT-5.4 Mini (High) 13.19 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
48 Gemini 3 Flash Preview (Medium) 12.78 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
49 Kimi K2.5 11.81 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-05
50 GPT-5 (High) 9.86 GPT-5
openai-gpt-5
Imported 2026-05-05
51 GPT-5.2 (Low) 9.72 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
52 Gemini 3.5 Flash (Minimal) 8.89 Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-05
53 Claude Opus 4 (Thinking 16K) 8.61 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
54 Opus 4.5 (Thinking, None) 7.78 Imported 2026-05-05
55 GPT-5 (Medium) 7.49 GPT-5
openai-gpt-5
Imported 2026-05-05
56 Claude Sonnet 4.5 (Thinking 8K) 6.94 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
57 Claude Sonnet 4.5 (Thinking 16K) 6.94 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
58 o3 (High) 6.53 o3
openai-o3
Imported 2026-05-05
59 GPT-5.1 (Thinking, Medium) 6.53 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
60 Tiny Recursion Model (TRM) 6.25 Imported 2026-05-05
61 o4-mini (High) 6.11 o4 Mini
openai-o4-mini
Imported 2026-05-05
62 Claude Sonnet 4 (Thinking 16K) 5.93 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
63 Claude Sonnet 4.5 (Thinking 1K) 5.83 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
64 GPT-5.4 Nano (xHigh) 5.69 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
65 Grok 4 (Fast Reasoning) 5.28 GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-05
66 o3-Pro (High) 4.86 Imported 2026-05-05
67 Gemini 2.5 Pro (Thinking 32K) 4.86 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
68 Minimax M2.5 4.86 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-05
69 GLM-5 4.86 GLM GLM 5
z-ai-glm-5
Imported 2026-05-05
70 Claude Opus 4 (Thinking 8K) 4.52 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
71 GPT-5 Mini (High) 4.44 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
72 GPT-5.4 Mini (Medium) 4.44 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
73 Gemini 2.5 Pro (Thinking 16K) 4.03 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
74 GPT-5 Mini (Medium) 4.03 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
75 Claude Haiku 4.5 (Thinking 32K) 4.03 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
76 Deepseek V3.2 4.03 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-05
77 Claude Sonnet 4.5 3.75 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-05
78 GPT-5.4 Nano (High) 3.61 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
79 Gemini 3 Flash Preview (Minimal) 3.33 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
80 o3-mini (High) 2.99 o3 Mini High
openai-o3-mini-high
Imported 2026-05-05
81 o3 (Medium) 2.98 o3
openai-o3
Imported 2026-05-05
82 Gemini 2.5 Pro (Thinking 8K) 2.92 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
83 Claude Haiku 4.5 (Thinking 16K) 2.78 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
84 GPT-5 Nano (High) 2.61 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
85 Gemini 2.5 Flash (Preview) (Thinking 24K) 2.54 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
86 ARChitects 2.50 Imported 2026-05-05
87 o4-mini (Medium) 2.36 o4 Mini
openai-o4-mini
Imported 2026-05-05
88 Gemini 2.5 Flash (Preview) (Thinking 1K) 2.16 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
89 Gemini 2.5 Flash (Preview) (Thinking 8K) 2.12 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
90 Claude Sonnet 4 (Thinking 8K) 2.12 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
91 o3-mini (Medium) 2.08 o3-mini
openai-o3-mini
Imported 2026-05-05
92 o3-Pro (Low) 2.05 Imported 2026-05-05
93 o3 (Low) 1.99 o3
openai-o3
Imported 2026-05-05
94 Gemini 2.5 Flash (Preview) (Thinking 16K) 1.98 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
95 o3-Pro (Medium) 1.94 Imported 2026-05-05
96 GPT-5 (Low) 1.94 GPT-5
openai-gpt-5
Imported 2026-05-05
97 GPT-5 (Low) 1.94 GPT-5
openai-gpt-5
Imported 2026-05-05
98 GPT-5.1 (Thinking, Low) 1.94 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
99 GPT-5.4 Nano (Medium) 1.94 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
100 Gemini 2.5 Flash (Preview) 1.69 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-05
101 o4-mini (Low) 1.67 o4 Mini
openai-o4-mini
Imported 2026-05-05
102 GPT-5 Mini (Minimal) 1.67 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
103 Claude Haiku 4.5 (Thinking 8K) 1.67 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
104 Icecuber 1.60 Imported 2026-05-05
105 GPT-5.4 Nano (Low) 1.53 GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-05
106 Gemini 2.0 Flash 1.30 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-05
107 Deepseek R1 1.30 R1
deepseek-r1
Imported 2026-05-05
108 Codex Mini (Latest) 1.27 Imported 2026-05-05
109 Claude Sonnet 4 1.27 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
110 Claude Opus 4 1.27 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
111 Qwen3-235b-a22b Instruct (25/07) 1.25 Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-05
112 Claude Haiku 4.5 1.25 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
113 Claude Haiku 4.5 (Thinking 1K) 1.25 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-05
114 Gemini 3 Flash Preview (Low) 1.25 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-05
115 Deepseek R1 (05/28) 1.12 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-05
116 GPT-5.4 Mini (Low) 1.11 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-05
117 Claude 3.7 (8K) 0.90 Imported 2026-05-05
118 GPT-5 Nano (Medium) 0.88 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
119 Claude Sonnet 4 (Thinking 1K) 0.85 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-05
120 o1-mini 0.83 Imported 2026-05-05
121 GPT-5 Mini (Low) 0.83 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-05
122 GPT-5.2 0.83 GPT-5.2
openai-gpt-5.2
Imported 2026-05-05
123 Gemini 1.5 Pro 0.80 Imported 2026-05-05
124 GPT-4.5 0.80 GPT-4.5
openai-gpt-4.5-preview
Imported 2026-05-05
125 Claude 3.7 (16K) 0.70 Imported 2026-05-05
126 GPT-4.1 0.42 GPT-4.1
openai-gpt-4.1
Imported 2026-05-05
127 Grok 3 Mini (Low) 0.42 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-05
128 GPT-5.1 (Thinking, None) 0.42 GPT-5.1
openai-gpt-5.1
Imported 2026-05-05
129 Claude 3.7 (1K) 0.40 Imported 2026-05-05
130 Claude 3.7 0 Imported 2026-05-05
131 GPT-4o 0 GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-05
132 GPT-4o-mini 0 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-05
133 Llama 4 Maverick 0 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-05
134 Llama 4 Scout 0 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-05
135 GPT-4.1-Nano 0 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-05
136 GPT-4.1-Mini 0 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-05
137 o3-mini (Low) 0 o3-mini
openai-o3-mini
Imported 2026-05-05
138 Claude Opus 4 (Thinking 1K) 0 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-05
139 Grok 3 0 GROK Grok 3
xaigrok-3
Imported 2026-05-05
140 Magistral Small 0 Imported 2026-05-05
141 Magistral Medium 0 Imported 2026-05-05
142 Magistral Medium (Thinking) 0 Imported 2026-05-05
143 Gemini 2.5 Pro (Thinking 1K) 0 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-05
144 GPT-5 (Minimal) 0 GPT-5
openai-gpt-5
Imported 2026-05-05
145 GPT-5 Nano (Low) 0 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
146 GPT-5 Nano (Minimal) 0 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-05
1 GPT-5.5 85% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
2 GPT-5.4 Pro 83.3% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
3 Gemini 3.1 Pro Preview 77.1% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
4 Claude Opus 4.7 75.8% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
5 GPT-5.4 73.3% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23