SimpleBench

Multiple-choice benchmark of simple-looking reasoning questions designed so unspecialized humans outperform current frontier models.

27rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Standard error (lower is better)

Latest Results

Rows parsed from the public leaderboard table. Official public question set: https://raw.githubusercontent.com/simple-bench/SimpleBench/main/simple_bench_public.json.

Rank Subject Score Model Match Provenance Sampled
1 Gemini 3 Pro 76.40 Gemini 3
google-gemini-3
Imported 2026-05-06
2 Gemini 2.5 Pro (Jun 2025) 62.40 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
3 Claude Opus 4.5 62 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
4 GPT-5.2 61.60 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
5 Grok 4 60.50 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
6 o3 53.10 o3
openai-o3
Imported 2026-05-06
7 Claude 3.7 Sonnet 46.40 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
8 o1 41.70 o1
openai-o1
Imported 2026-05-06
9 DeepSeek V3 40.80 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
10 o4-mini (high) 38.70 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
11 Grok-3 mini 36.10 GROK Grok 3 Mini
x-ai-grok-3-mini
Imported 2026-05-06
12 GPT-4.1 34.50 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
13 Qwen3-235B-A22B 31 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
14 DeepSeek R1 30.90 R1
deepseek-r1
Imported 2026-05-06
15 Gemini 2.0 Flash Thinking Exp 30.70 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-06
16 Llama-4-Maverick-17B-128E-Instruct 27.70 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
17 Gemini 1.5 Flash 27.10 Imported 2026-05-06
18 kimi-k2-thinking (official) 26.30 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
19 GPT-4 Turbo 25.10 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
20 Claude 3 Opus 23.50 Imported 2026-05-06
21 Llama 3.1 405B 23 Imported 2026-05-06
22 Mistral Large 22.50 Mistral Large
mistralai-mistral-large
Imported 2026-05-06
23 GPT-OSS 120B 22.10 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
24 Llama 3.3 70B 19.90 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
25 GPT-4o 17.80 GPT-4o
openai-gpt-4o
Imported 2026-05-06
26 c4ai-command-a-03-2025 17.40 Imported 2026-05-06
27 gpt-4o-mini-2024-07-18 10.70 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-06