ARC-AGI-2
Second ARC-AGI benchmark variant with a harder grid-reasoning task distribution and semi-private leaderboard evaluation.
151rows
scoreprimary metric
2026-05-05sampled
Metadata
Metrics
Score, Cost/task (lower is better), Total cost (lower is better)
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Panel | 100 | — | Imported | 2026-05-05 |
| 2 | GPT-5.5 (xHigh) | 85 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 3 | Gemini 3 Deep Think (2/26) | 84.58 | — | Imported | 2026-05-05 |
| 4 | GPT-5.5 Pro (High) | 84.58 | GPT-5.5 Pro openai-gpt-5.5-pro | Imported | 2026-05-05 |
| 5 | GPT-5.5 Pro (xHigh) | 84.16 | GPT-5.5 Pro openai-gpt-5.5-pro | Imported | 2026-05-05 |
| 6 | GPT-5.4 Pro (xHigh) | 83.33 | GPT-5.4 Pro openai-gpt-5.4-pro | Imported | 2026-05-05 |
| 7 | GPT-5.5 (High) | 83.33 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 8 | Gemini 3.1 Pro (Preview) | 77.08 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-05 |
| 9 | Claude 4.7 (Max) | 75.83 | — | Imported | 2026-05-05 |
| 10 | GPT-5.4 (xHigh) | 73.95 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 11 | GPT-5.2 (Refine.) | 72.90 | — | Imported | 2026-05-05 |
| 12 | Gemini 3.5 Flash (High) | 72.08 | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-05 |
| 13 | GPT-5.5 (Medium) | 70.42 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 14 | Claude Opus 4.6 (120K, High) | 69.17 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 15 | Claude Opus 4.6 (120K, Max) | 68.75 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 16 | Claude 4.7 (High) | 68.33 | — | Imported | 2026-05-05 |
| 17 | GPT-5.4 (High) | 67.50 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 18 | Claude 4.7 (Medium) | 67.50 | — | Imported | 2026-05-05 |
| 19 | Claude Opus 4.6 (120K, Medium) | 66.25 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 20 | Grok 4.20 (Reasoning) | 65.14 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-05 |
| 21 | Claude Opus 4.6 (120K, Low) | 64.58 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-05 |
| 22 | Claude 4.7 (Low) | 62.08 | — | Imported | 2026-05-05 |
| 23 | Claude Sonnet 4.6 (High) | 60.42 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-05 |
| 24 | Claude Sonnet 4.6 (Max) | 58.33 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-05 |
| 25 | GPT-5.4 (Medium) | 55.42 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 26 | GPT-5.2 Pro (High) | 54.16 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-05 |
| 27 | Gemini 3 Pro (Refine.) | 54 | — | Imported | 2026-05-05 |
| 28 | GPT-5.2 (xHigh) | 52.91 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 29 | Gemini 3 Deep Think (Preview) ² | 45.14 | — | Imported | 2026-05-05 |
| 30 | GPT-5.2 (High) | 43.33 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 31 | GPT-5.2 Pro (Medium) | 38.47 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-05 |
| 32 | Opus 4.5 (Thinking, 64K) | 37.64 | — | Imported | 2026-05-05 |
| 33 | Gemini 3 Flash Preview (High) | 33.61 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 34 | GPT-5.5 (Low) | 33.33 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-05 |
| 35 | Gemini 3 Pro | 31.11 | Gemini 3 google-gemini-3 | Imported | 2026-05-05 |
| 36 | Grok 4 (Refine.) | 29.44 | — | Imported | 2026-05-05 |
| 37 | GPT-5.4 (Low) | 29.17 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-05 |
| 38 | NVARC | 27.64 | — | Imported | 2026-05-05 |
| 39 | GPT-5.2 (Medium) | 26.67 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 40 | Opus 4.5 (Thinking, 16K) | 22.78 | — | Imported | 2026-05-05 |
| 41 | GPT-5.4 Mini (xHigh) | 18.90 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 42 | GPT-5 Pro | 18.33 | GPT-5 Pro openai-gpt-5-pro | Imported | 2026-05-05 |
| 43 | GPT-5.1 (Thinking, High) | 17.64 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 44 | Grok 4 (Thinking) | 15.97 | Grok 4 x-ai-grok-4 | Imported | 2026-05-05 |
| 45 | Opus 4.5 (Thinking, 8K) | 13.89 | — | Imported | 2026-05-05 |
| 46 | Claude Sonnet 4.5 (Thinking 32K) | 13.61 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 47 | GPT-5.4 Mini (High) | 13.19 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 48 | Gemini 3 Flash Preview (Medium) | 12.78 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 49 | Kimi K2.5 | 11.81 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-05 |
| 50 | GPT-5 (High) | 9.86 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 51 | GPT-5.2 (Low) | 9.72 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 52 | Gemini 3.5 Flash (Minimal) | 8.89 | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-05 |
| 53 | Claude Opus 4 (Thinking 16K) | 8.61 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 54 | Opus 4.5 (Thinking, None) | 7.78 | — | Imported | 2026-05-05 |
| 55 | GPT-5 (Medium) | 7.49 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 56 | Claude Sonnet 4.5 (Thinking 8K) | 6.94 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 57 | Claude Sonnet 4.5 (Thinking 16K) | 6.94 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 58 | o3 (High) | 6.53 | o3 openai-o3 | Imported | 2026-05-05 |
| 59 | GPT-5.1 (Thinking, Medium) | 6.53 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 60 | Tiny Recursion Model (TRM) | 6.25 | — | Imported | 2026-05-05 |
| 61 | o4-mini (High) | 6.11 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 62 | Claude Sonnet 4 (Thinking 16K) | 5.93 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 63 | Claude Sonnet 4.5 (Thinking 1K) | 5.83 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 64 | GPT-5.4 Nano (xHigh) | 5.69 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 65 | Grok 4 (Fast Reasoning) | 5.28 | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-05 |
| 66 | o3-Pro (High) | 4.86 | — | Imported | 2026-05-05 |
| 67 | Gemini 2.5 Pro (Thinking 32K) | 4.86 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 68 | Minimax M2.5 | 4.86 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-05 |
| 69 | GLM-5 | 4.86 | GLM 5 z-ai-glm-5 | Imported | 2026-05-05 |
| 70 | Claude Opus 4 (Thinking 8K) | 4.52 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 71 | GPT-5 Mini (High) | 4.44 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 72 | GPT-5.4 Mini (Medium) | 4.44 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 73 | Gemini 2.5 Pro (Thinking 16K) | 4.03 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 74 | GPT-5 Mini (Medium) | 4.03 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 75 | Claude Haiku 4.5 (Thinking 32K) | 4.03 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 76 | Deepseek V3.2 | 4.03 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-05 |
| 77 | Claude Sonnet 4.5 | 3.75 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-05 |
| 78 | GPT-5.4 Nano (High) | 3.61 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 79 | Gemini 3 Flash Preview (Minimal) | 3.33 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 80 | o3-mini (High) | 2.99 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-05 |
| 81 | o3 (Medium) | 2.98 | o3 openai-o3 | Imported | 2026-05-05 |
| 82 | Gemini 2.5 Pro (Thinking 8K) | 2.92 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 83 | Claude Haiku 4.5 (Thinking 16K) | 2.78 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 84 | GPT-5 Nano (High) | 2.61 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 85 | Gemini 2.5 Flash (Preview) (Thinking 24K) | 2.54 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 86 | ARChitects | 2.50 | — | Imported | 2026-05-05 |
| 87 | o4-mini (Medium) | 2.36 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 88 | Gemini 2.5 Flash (Preview) (Thinking 1K) | 2.16 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 89 | Gemini 2.5 Flash (Preview) (Thinking 8K) | 2.12 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 90 | Claude Sonnet 4 (Thinking 8K) | 2.12 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 91 | o3-mini (Medium) | 2.08 | o3-mini openai-o3-mini | Imported | 2026-05-05 |
| 92 | o3-Pro (Low) | 2.05 | — | Imported | 2026-05-05 |
| 93 | o3 (Low) | 1.99 | o3 openai-o3 | Imported | 2026-05-05 |
| 94 | Gemini 2.5 Flash (Preview) (Thinking 16K) | 1.98 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 95 | o3-Pro (Medium) | 1.94 | — | Imported | 2026-05-05 |
| 96 | GPT-5 (Low) | 1.94 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 97 | GPT-5 (Low) | 1.94 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 98 | GPT-5.1 (Thinking, Low) | 1.94 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 99 | GPT-5.4 Nano (Medium) | 1.94 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 100 | Gemini 2.5 Flash (Preview) | 1.69 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-05 |
| 101 | o4-mini (Low) | 1.67 | o4 Mini openai-o4-mini | Imported | 2026-05-05 |
| 102 | GPT-5 Mini (Minimal) | 1.67 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 103 | Claude Haiku 4.5 (Thinking 8K) | 1.67 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 104 | Icecuber | 1.60 | — | Imported | 2026-05-05 |
| 105 | GPT-5.4 Nano (Low) | 1.53 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-05 |
| 106 | Gemini 2.0 Flash | 1.30 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-05 |
| 107 | Deepseek R1 | 1.30 | R1 deepseek-r1 | Imported | 2026-05-05 |
| 108 | Codex Mini (Latest) | 1.27 | — | Imported | 2026-05-05 |
| 109 | Claude Sonnet 4 | 1.27 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 110 | Claude Opus 4 | 1.27 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 111 | Qwen3-235b-a22b Instruct (25/07) | 1.25 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-05 |
| 112 | Claude Haiku 4.5 | 1.25 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 113 | Claude Haiku 4.5 (Thinking 1K) | 1.25 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-05 |
| 114 | Gemini 3 Flash Preview (Low) | 1.25 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-05 |
| 115 | Deepseek R1 (05/28) | 1.12 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-05 |
| 116 | GPT-5.4 Mini (Low) | 1.11 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-05 |
| 117 | Claude 3.7 (8K) | 0.90 | — | Imported | 2026-05-05 |
| 118 | GPT-5 Nano (Medium) | 0.88 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 119 | Claude Sonnet 4 (Thinking 1K) | 0.85 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-05 |
| 120 | o1-mini | 0.83 | — | Imported | 2026-05-05 |
| 121 | GPT-5 Mini (Low) | 0.83 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-05 |
| 122 | GPT-5.2 | 0.83 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-05 |
| 123 | Gemini 1.5 Pro | 0.80 | — | Imported | 2026-05-05 |
| 124 | GPT-4.5 | 0.80 | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-05 |
| 125 | Claude 3.7 (16K) | 0.70 | — | Imported | 2026-05-05 |
| 126 | GPT-4.1 | 0.42 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-05 |
| 127 | Grok 3 Mini (Low) | 0.42 | Grok 3 Mini x-ai-grok-3-mini | Imported | 2026-05-05 |
| 128 | GPT-5.1 (Thinking, None) | 0.42 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-05 |
| 129 | Claude 3.7 (1K) | 0.40 | — | Imported | 2026-05-05 |
| 130 | Claude 3.7 | 0 | — | Imported | 2026-05-05 |
| 131 | GPT-4o | 0 | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-05 |
| 132 | GPT-4o-mini | 0 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-05 |
| 133 | Llama 4 Maverick | 0 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-05 |
| 134 | Llama 4 Scout | 0 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-05 |
| 135 | GPT-4.1-Nano | 0 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-05 |
| 136 | GPT-4.1-Mini | 0 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-05 |
| 137 | o3-mini (Low) | 0 | o3-mini openai-o3-mini | Imported | 2026-05-05 |
| 138 | Claude Opus 4 (Thinking 1K) | 0 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-05 |
| 139 | Grok 3 | 0 | Grok 3 xaigrok-3 | Imported | 2026-05-05 |
| 140 | Magistral Small | 0 | — | Imported | 2026-05-05 |
| 141 | Magistral Medium | 0 | — | Imported | 2026-05-05 |
| 142 | Magistral Medium (Thinking) | 0 | — | Imported | 2026-05-05 |
| 143 | Gemini 2.5 Pro (Thinking 1K) | 0 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-05 |
| 144 | GPT-5 (Minimal) | 0 | GPT-5 openai-gpt-5 | Imported | 2026-05-05 |
| 145 | GPT-5 Nano (Low) | 0 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 146 | GPT-5 Nano (Minimal) | 0 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-05 |
| 1 | GPT-5.5 | 85% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 2 | GPT-5.4 Pro | 83.3% | GPT-5.4 Pro openai-gpt-5.4-pro | Launch post | 2026-04-23 |
| 3 | Gemini 3.1 Pro Preview | 77.1% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
| 4 | Claude Opus 4.7 | 75.8% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
| 5 | GPT-5.4 | 73.3% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
No matching rows.