ZebraLogic
ZebraLogic evaluates models on grid-style zebra logic puzzles, reporting exact puzzle accuracy and cell-level accuracy across difficulty and puzzle sizes.
66rows
puzzle_accprimary metric
2026-05-06sampled
Metadata
Metrics
Puzzle Acc, Cell Acc, Easy Puzzle Acc, Hard Puzzle Acc, Small Puzzle Acc, Medium Puzzle Acc, Large Puzzle Acc, XL Puzzle Acc, No answer (lower is better), Reason Len (lower is better), Total Puzzles
| Rank | Subject | Puzzle Acc | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o3-mini-2025-01-31-high | 91.70 | o3 Mini High openai-o3-mini-high | Imported | 2026-05-06 |
| 2 | o3-mini-2025-01-31-medium | 88.90 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 3 | o1-2024-12-17 | 81 | o1 openai-o1 | Imported | 2026-05-06 |
| 4 | deepseek-R1 | 78.70 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 5 | o3-mini-2025-01-31-low | 74.80 | — | Imported | 2026-05-06 |
| 6 | o1-preview-2024-09-12 | 71.40 | — | Imported | 2026-05-06 |
| 7 | o1-preview-2024-09-12-v2 | 70.40 | — | Imported | 2026-05-06 |
| 8 | o1-mini-2024-09-12-v3 | 59.70 | — | Imported | 2026-05-06 |
| 9 | o1-mini-2024-09-12-v2 | 56.80 | — | Imported | 2026-05-06 |
| 10 | o1-mini-2024-09-12 | 52.60 | — | Imported | 2026-05-06 |
| 11 | deepseek-v3 | 42.10 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 12 | claude-3-5-sonnet-20241022 | 36.20 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 13 | claude-3-5-sonnet-20240620 | 33.40 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 14 | Llama-3.1-405B-Inst-fp8@together | 32.60 | — | Imported | 2026-05-06 |
| 15 | gpt-4o-2024-08-06 | 31.70 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-06 |
| 16 | gemini-1.5-pro-exp-0827 | 30.50 | — | Imported | 2026-05-06 |
| 17 | Llama-3.1-405B-Inst@sambanova | 30.10 | — | Imported | 2026-05-06 |
| 18 | chatgpt-4o-latest-24-09-07 | 29.90 | — | Imported | 2026-05-06 |
| 19 | Mistral-Large-2 | 29 | — | Imported | 2026-05-06 |
| 20 | gpt-4-turbo-2024-04-09 | 28.40 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 21 | gpt-4o-2024-05-13 | 28.20 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-06 |
| 22 | grok-2-1212 | 27.70 | — | Imported | 2026-05-06 |
| 23 | gpt-4-0314 | 27.10 | GPT-4 (older v0314) openai-gpt-4-0314 | Imported | 2026-05-06 |
| 24 | claude-3-opus-20240229 | 27 | — | Imported | 2026-05-06 |
| 25 | Qwen2.5-72B-Instruct | 26.60 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-06 |
| 26 | Qwen2.5-32B-Instruct | 26.10 | — | Imported | 2026-05-06 |
| 27 | gemini-1.5-pro-exp-0801 | 25.20 | — | Imported | 2026-05-06 |
| 28 | gemini-1.5-flash-exp-0827 | 25 | — | Imported | 2026-05-06 |
| 29 | Llama-3.1-405B-Inst@hyperbolic | 25 | — | Imported | 2026-05-06 |
| 30 | Meta-Llama-3.1-70B-Instruct | 24.90 | — | Imported | 2026-05-06 |
| 31 | deepseek-v2-chat-0628 | 22.70 | — | Imported | 2026-05-06 |
| 32 | deepseek-v2.5-0908 | 22.10 | — | Imported | 2026-05-06 |
| 33 | Qwen2-72B-Instruct | 21.40 | — | Imported | 2026-05-06 |
| 34 | deepseek-v2-coder-0614 | 21.10 | — | Imported | 2026-05-06 |
| 35 | deepseek-v2-coder-0724 | 20.50 | — | Imported | 2026-05-06 |
| 36 | gpt-4o-mini-2024-07-18 | 20.10 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-06 |
| 37 | gemini-1.5-flash | 19.40 | — | Imported | 2026-05-06 |
| 38 | gemini-1.5-pro | 19.40 | — | Imported | 2026-05-06 |
| 39 | yi-large-preview | 18.90 | — | Imported | 2026-05-06 |
| 40 | yi-large | 18.80 | — | Imported | 2026-05-06 |
| 41 | claude-3-5-haiku-20241022 | 18.70 | — | Imported | 2026-05-06 |
| 42 | claude-3-sonnet-20240229 | 18.70 | — | Imported | 2026-05-06 |
| 43 | Meta-Llama-3-70B-Instruct | 16.80 | — | Imported | 2026-05-06 |
| 44 | Athene-70B | 16.70 | — | Imported | 2026-05-06 |
| 45 | gemma-2-27b-it | 16.30 | Gemma 2 27B google-gemma-2-27b-it | Imported | 2026-05-06 |
| 46 | claude-3-haiku-20240307 | 14.30 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 47 | command-r-plus | 13.90 | — | Imported | 2026-05-06 |
| 48 | reka-core-20240501 | 13 | — | Imported | 2026-05-06 |
| 49 | gemma-2-9b-it | 12.80 | — | Imported | 2026-05-06 |
| 50 | Meta-Llama-3.1-8B-Instruct | 12.80 | — | Imported | 2026-05-06 |
| 51 | Qwen2.5-7B-Instruct | 12 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Imported | 2026-05-06 |
| 52 | Meta-Llama-3-8B-Instruct | 11.90 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 53 | Mistral-Nemo-Instruct-2407 | 11.80 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-06 |
| 54 | Phi-3-mini-4k-instruct | 11.60 | — | Imported | 2026-05-06 |
| 55 | Yi-1.5-34B-Chat | 11.50 | — | Imported | 2026-05-06 |
| 56 | gpt-3.5-turbo-0125 | 10.10 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 57 | command-r | 9.90 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-06 |
| 58 | reka-flash-20240226 | 9.30 | — | Imported | 2026-05-06 |
| 59 | mathstral-7B-v0.1 | 9 | — | Imported | 2026-05-06 |
| 60 | Mixtral-8x7B-Instruct-v0.1 | 8.70 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 61 | Qwen2-7B-Instruct | 8.40 | — | Imported | 2026-05-06 |
| 62 | Llama-3.2-3B-Instruct@together | 7.40 | — | Imported | 2026-05-06 |
| 63 | Phi-3.5-mini-instruct | 6.40 | — | Imported | 2026-05-06 |
| 64 | Qwen2.5-3B-Instruct | 4.80 | — | Imported | 2026-05-06 |
| 65 | gemma-2-2b-it | 4.20 | — | Imported | 2026-05-06 |
| 66 | Yi-1.5-9B-Chat | 2.30 | — | Imported | 2026-05-06 |
No matching rows.