EnigmaEval
EnigmaEval is a benchmark from puzzle hunts, testing AI with complex reasoning, creative problem-solving, and cross-domain knowledge synthesis.
47rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5.4-pro-2026-03-05 | 23.82 | GPT-5.4 Pro openai-gpt-5.4-pro | Imported | 2026-05-06 |
| 1 | gemini-3.1-pro-preview | 19.76 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 2 | gpt-5-pro-2025-10-06 | 18.75 | GPT-5 Pro openai-gpt-5-pro | Imported | 2026-05-06 |
| 2 | gemini-3-pro-preview | 18.24 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 2 | gpt-5.4-2026-03-05 (xhigh thinking) | 15.96 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 5 | o3 (medium) (April 2025) | 13.09 | o3 openai-o3 | Imported | 2026-05-06 |
| 6 | o3 (high) (April 2025) | 11.91 | o3 openai-o3 | Imported | 2026-05-06 |
| 6 | claude-opus-4-5-20251101-thinking | 11.91 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 6 | gpt-5.1-thinking | 11.23 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 6 | gpt-5-2025-08-07 | 10.47 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 6 | gpt-5.2-2025-12-11 | 10.39 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 7 | o4-mini (high) (April 2025) | 9.21 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 9 | gpt-5-mini-2025-08-07 | 8.19 | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-06 |
| 10 | claude-opus-4-6-thinking-max | 7.60 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 12 | claude-opus-4-1-20250805-thinking | 7.18 | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-06 |
| 12 | claude-opus-4-6 (Non-Thinking) | 6.84 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 12 | o4-mini (medium) (April 2025) | 6.81 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 13 | o1 Pro (March 2025) | 6.14 | o1-pro openai-o1-pro | Imported | 2026-05-06 |
| 13 | claude-sonnet-4-5-20250929-thinking | 6.00 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 13 | o1 (December 2024) | 5.65 | o1 openai-o1 | Imported | 2026-05-06 |
| 13 | Claude Opus 4 (Thinking) | 5.57 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-06 |
| 13 | gemini-2.5-pro-preview-06-05 | 5.57 | Gemini 2.5 Pro Preview 06-05 google-gemini-2.5-pro-preview | Imported | 2026-05-06 |
| 15 | claude-opus-4-1-20250805 | 4.81 | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-06 |
| 16 | claude-opus-4-5-20251101 | 4.65 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 17 | Claude 3.7 Sonnet Thinking (Feb 2025) | 4.23 | Claude 3.7 Sonnet (thinking) anthropic-claude-3.7-sonnet-thinking | Imported | 2026-05-06 |
| 18 | Gemini 2.5 Pro Experimental (March 2025) | 4.14 | — | Imported | 2026-05-06 |
| 20 | claude-sonnet-4-5-20250929 | 3.38 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 20 | kimi-k2.5 | 3.38 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 23 | Claude Opus 4 | 3.21 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-06 |
| 23 | GPT-4.5 Preview (February 2025) | 3.18 | GPT-4.5 openai-gpt-4.5-preview | Imported | 2026-05-06 |
| 23 | Claude Sonnet 4 (Thinking) | 3.12 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-06 |
| 23 | gemini-3.1-flash-lite-preview | 3.04 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-06 |
| 23 | Gemini 2.5 Flash Preview (May 2025) | 2.70 | — | Imported | 2026-05-06 |
| 25 | Gemini 2.5 Pro Preview (May 06 2025) | 2.36 | Gemini 2.5 Pro Preview 06-05 google-gemini-2.5-pro-preview | Imported | 2026-05-06 |
| 25 | Claude 3.7 Sonnet (February 2025) | 2.26 | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-06 |
| 26 | Claude Sonnet 4 | 2.20 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-06 |
| 26 | GPT-4.1 | 2.17 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 27 | gpt-5.1-instant | 1.94 | GPT-5.1 Chat openai-gpt-5.1-chat | Imported | 2026-05-06 |
| 34 | Gemini 2.0 Flash Thinking (January 2025) | 1.10 | — | Imported | 2026-05-06 |
| 35 | Claude 3.5 Sonnet (October 2024) | 0.91 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 36 | Pixtral Large (November 2024) | 0.84 | — | Imported | 2026-05-06 |
| 38 | Claude 3 Opus | 0.82 | — | Imported | 2026-05-06 |
| 38 | GPT-4o (November 2024) | 0.80 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 38 | Gemini 2.0 Pro Experimental (Feb 2025) | 0.69 | — | Imported | 2026-05-06 |
| 39 | Gemini 2.0 Flash (February 2025) | 0.63 | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-06 |
| 39 | Llama 4 Maverick | 0.58 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 39 | Llama 3.2 90B Vision Instruct | 0.38 | — | Imported | 2026-05-06 |
No matching rows.