ZebraLogic

ZebraLogic evaluates models on grid-style zebra logic puzzles, reporting exact puzzle accuracy and cell-level accuracy across difficulty and puzzle sizes.

66rows
puzzle_accprimary metric
2026-05-06sampled

Metadata

Metrics

Puzzle Acc, Cell Acc, Easy Puzzle Acc, Hard Puzzle Acc, Small Puzzle Acc, Medium Puzzle Acc, Large Puzzle Acc, XL Puzzle Acc, No answer (lower is better), Reason Len (lower is better), Total Puzzles

Latest Results

Snapshot mirrors the public ZebraLogic summary JSON using the Space's default greedy mode. Sampling rows are left out of this model-level snapshot to avoid mixing decoding regimes.

Rank Subject Puzzle Acc Model Match Provenance Sampled
1 o3-mini-2025-01-31-high 91.70 o3 Mini High
openai-o3-mini-high
Imported 2026-05-06
2 o3-mini-2025-01-31-medium 88.90 o3-mini
openai-o3-mini
Imported 2026-05-06
3 o1-2024-12-17 81 o1
openai-o1
Imported 2026-05-06
4 deepseek-R1 78.70 R1
deepseek-r1
Imported 2026-05-06
5 o3-mini-2025-01-31-low 74.80 Imported 2026-05-06
6 o1-preview-2024-09-12 71.40 Imported 2026-05-06
7 o1-preview-2024-09-12-v2 70.40 Imported 2026-05-06
8 o1-mini-2024-09-12-v3 59.70 Imported 2026-05-06
9 o1-mini-2024-09-12-v2 56.80 Imported 2026-05-06
10 o1-mini-2024-09-12 52.60 Imported 2026-05-06
11 deepseek-v3 42.10 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
12 claude-3-5-sonnet-20241022 36.20 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
13 claude-3-5-sonnet-20240620 33.40 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
14 Llama-3.1-405B-Inst-fp8@together 32.60 Imported 2026-05-06
15 gpt-4o-2024-08-06 31.70 GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-06
16 gemini-1.5-pro-exp-0827 30.50 Imported 2026-05-06
17 Llama-3.1-405B-Inst@sambanova 30.10 Imported 2026-05-06
18 chatgpt-4o-latest-24-09-07 29.90 Imported 2026-05-06
19 Mistral-Large-2 29 Imported 2026-05-06
20 gpt-4-turbo-2024-04-09 28.40 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
21 gpt-4o-2024-05-13 28.20 GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-06
22 grok-2-1212 27.70 Imported 2026-05-06
23 gpt-4-0314 27.10 GPT-4 (older v0314)
openai-gpt-4-0314
Imported 2026-05-06
24 claude-3-opus-20240229 27 Imported 2026-05-06
25 Qwen2.5-72B-Instruct 26.60 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-06
26 Qwen2.5-32B-Instruct 26.10 Imported 2026-05-06
27 gemini-1.5-pro-exp-0801 25.20 Imported 2026-05-06
28 gemini-1.5-flash-exp-0827 25 Imported 2026-05-06
29 Llama-3.1-405B-Inst@hyperbolic 25 Imported 2026-05-06
30 Meta-Llama-3.1-70B-Instruct 24.90 Imported 2026-05-06
31 deepseek-v2-chat-0628 22.70 Imported 2026-05-06
32 deepseek-v2.5-0908 22.10 Imported 2026-05-06
33 Qwen2-72B-Instruct 21.40 Imported 2026-05-06
34 deepseek-v2-coder-0614 21.10 Imported 2026-05-06
35 deepseek-v2-coder-0724 20.50 Imported 2026-05-06
36 gpt-4o-mini-2024-07-18 20.10 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-06
37 gemini-1.5-flash 19.40 Imported 2026-05-06
38 gemini-1.5-pro 19.40 Imported 2026-05-06
39 yi-large-preview 18.90 Imported 2026-05-06
40 yi-large 18.80 Imported 2026-05-06
41 claude-3-5-haiku-20241022 18.70 Imported 2026-05-06
42 claude-3-sonnet-20240229 18.70 Imported 2026-05-06
43 Meta-Llama-3-70B-Instruct 16.80 Imported 2026-05-06
44 Athene-70B 16.70 Imported 2026-05-06
45 gemma-2-27b-it 16.30 Gemma 2 27B
google-gemma-2-27b-it
Imported 2026-05-06
46 claude-3-haiku-20240307 14.30 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-06
47 command-r-plus 13.90 Imported 2026-05-06
48 reka-core-20240501 13 Imported 2026-05-06
49 gemma-2-9b-it 12.80 Imported 2026-05-06
50 Meta-Llama-3.1-8B-Instruct 12.80 Imported 2026-05-06
51 Qwen2.5-7B-Instruct 12 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Imported 2026-05-06
52 Meta-Llama-3-8B-Instruct 11.90 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
53 Mistral-Nemo-Instruct-2407 11.80 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-06
54 Phi-3-mini-4k-instruct 11.60 Imported 2026-05-06
55 Yi-1.5-34B-Chat 11.50 Imported 2026-05-06
56 gpt-3.5-turbo-0125 10.10 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
57 command-r 9.90 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-06
58 reka-flash-20240226 9.30 Imported 2026-05-06
59 mathstral-7B-v0.1 9 Imported 2026-05-06
60 Mixtral-8x7B-Instruct-v0.1 8.70 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
61 Qwen2-7B-Instruct 8.40 Imported 2026-05-06
62 Llama-3.2-3B-Instruct@together 7.40 Imported 2026-05-06
63 Phi-3.5-mini-instruct 6.40 Imported 2026-05-06
64 Qwen2.5-3B-Instruct 4.80 Imported 2026-05-06
65 gemma-2-2b-it 4.20 Imported 2026-05-06
66 Yi-1.5-9B-Chat 2.30 Imported 2026-05-06