ARC-AGI-3 | BenchmarkList

Metadata

Score, Cost/task (lower is better), Total cost (lower is better)

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Anthropic Opus 4.6 (Max)	0.51	—	Imported	2026-05-05
2	GPT-5.5 (High)	0.43	GPT-5.5 openai-gpt-5.5	Imported	2026-05-05
3	Gemini 3.1 Pro (Preview)	0.42	Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview	Imported	2026-05-05
4	GPT-5.4 (High)	0.21	GPT-5.4 openai-gpt-5.4	Imported	2026-05-05
5	Opus 4.7 (High)	0.18	—	Imported	2026-05-05
6	Grok 4.20 (Beta Reasoning)	0.09	GROK Grok 4.20 x-ai-grok-4.20	Imported	2026-05-05