EnigmaEval

EnigmaEval is a benchmark from puzzle hunts, testing AI with complex reasoning, creative problem-solving, and cross-domain knowledge synthesis.

47rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 gpt-5.4-pro-2026-03-05 23.82 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-06
1 gemini-3.1-pro-preview 19.76 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
2 gpt-5-pro-2025-10-06 18.75 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-06
2 gemini-3-pro-preview 18.24 Gemini 3
google-gemini-3
Imported 2026-05-06
2 gpt-5.4-2026-03-05 (xhigh thinking) 15.96 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
5 o3 (medium) (April 2025) 13.09 o3
openai-o3
Imported 2026-05-06
6 o3 (high) (April 2025) 11.91 o3
openai-o3
Imported 2026-05-06
6 claude-opus-4-5-20251101-thinking 11.91 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
6 gpt-5.1-thinking 11.23 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
6 gpt-5-2025-08-07 10.47 GPT-5
openai-gpt-5
Imported 2026-05-06
6 gpt-5.2-2025-12-11 10.39 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
7 o4-mini (high) (April 2025) 9.21 o4 Mini
openai-o4-mini
Imported 2026-05-06
9 gpt-5-mini-2025-08-07 8.19 GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-06
10 claude-opus-4-6-thinking-max 7.60 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
12 claude-opus-4-1-20250805-thinking 7.18 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
12 claude-opus-4-6 (Non-Thinking) 6.84 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
12 o4-mini (medium) (April 2025) 6.81 o4 Mini
openai-o4-mini
Imported 2026-05-06
13 o1 Pro (March 2025) 6.14 o1-pro
openai-o1-pro
Imported 2026-05-06
13 claude-sonnet-4-5-20250929-thinking 6.00 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
13 o1 (December 2024) 5.65 o1
openai-o1
Imported 2026-05-06
13 Claude Opus 4 (Thinking) 5.57 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-06
13 gemini-2.5-pro-preview-06-05 5.57 Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-06
15 claude-opus-4-1-20250805 4.81 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
16 claude-opus-4-5-20251101 4.65 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
17 Claude 3.7 Sonnet Thinking (Feb 2025) 4.23 Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-06
18 Gemini 2.5 Pro Experimental (March 2025) 4.14 Imported 2026-05-06
20 claude-sonnet-4-5-20250929 3.38 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
20 kimi-k2.5 3.38 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
23 Claude Opus 4 3.21 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-06
23 GPT-4.5 Preview (February 2025) 3.18 GPT-4.5
openai-gpt-4.5-preview
Imported 2026-05-06
23 Claude Sonnet 4 (Thinking) 3.12 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
23 gemini-3.1-flash-lite-preview 3.04 Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-06
23 Gemini 2.5 Flash Preview (May 2025) 2.70 Imported 2026-05-06
25 Gemini 2.5 Pro Preview (May 06 2025) 2.36 Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-06
25 Claude 3.7 Sonnet (February 2025) 2.26 Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-06
26 Claude Sonnet 4 2.20 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-06
26 GPT-4.1 2.17 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
27 gpt-5.1-instant 1.94 GPT-5.1 Chat
openai-gpt-5.1-chat
Imported 2026-05-06
34 Gemini 2.0 Flash Thinking (January 2025) 1.10 Imported 2026-05-06
35 Claude 3.5 Sonnet (October 2024) 0.91 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
36 Pixtral Large (November 2024) 0.84 Imported 2026-05-06
38 Claude 3 Opus 0.82 Imported 2026-05-06
38 GPT-4o (November 2024) 0.80 GPT-4o
openai-gpt-4o
Imported 2026-05-06
38 Gemini 2.0 Pro Experimental (Feb 2025) 0.69 Imported 2026-05-06
39 Gemini 2.0 Flash (February 2025) 0.63 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-06
39 Llama 4 Maverick 0.58 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
39 Llama 3.2 90B Vision Instruct 0.38 Imported 2026-05-06