Altered Riddles

Reasoning benchmark for conditioned override, testing whether models fall back to memorized answers when familiar riddles are deliberately modified with constraints, context swaps, meaning shifts, or bias probes.

23rows
conditioned_override_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Conditioned Override Rate (lower is better), Altered accuracy, Original accuracy, Pattern override rate (lower is better), Average output tokens per riddle (lower is better)

Latest Results

Rows are imported from the public Altered Riddles leaderboard JSON. Primary score is Conditioned Override Rate, where lower is better.

Rank Subject Conditioned Override Rate Model Match Provenance Sampled
1 xiaomi/mimo-v2-pro, high reasoning 0.2873 Imported 2026-05-27
2 openai/gpt-oss-20b (MXFP4), high reasoning 0.2896 Imported 2026-05-27
3 openai/gpt-5.4-mini, high reasoning 0.3058 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-27
4 minimaxai/minimax-m2.7 (FP4), high reasoning 0.3231 Imported 2026-05-27
5 zai-org/glm-5.1 (FP4), high reasoning 0.3239 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-27
6 openai/gpt-oss-120b (MXFP4), high reasoning 0.3316 Imported 2026-05-27
7 xiaomi/mimo-v2-omni, high reasoning 0.3366 Imported 2026-05-27
8 zai-org/glm-5 (FP4), high reasoning 0.3460 GLM GLM 5
z-ai-glm-5
Imported 2026-05-27
9 mistralai/mistral-small-2603, high reasoning 0.3487 Imported 2026-05-27
10 google/gemma-4-31b-it (FP8), high reasoning 0.3619 Imported 2026-05-27
11 openai/gpt-5.4-mini 0.4010 GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-27
12 qwen/qwen3.5-27b (UD-Q4_K_XL) 0.4234 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-27
13 moonshotai/kimi-k2.5, high reasoning 0.4319 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-27
14 moonshotai/kimi-k2.6, high reasoning 0.4413 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-27
15 google/gemma-4-31b-it (UD-Q6_K_XL) 0.4615 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-27
16 qwen/qwen3.6-27b (UD-Q6_K_XL) 0.4751 Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-27
17 anthropic/claude-sonnet-4.6, high reasoning 0.4837 Imported 2026-05-27
18 google/gemma-4-26b-a4b-it (UD-Q6_K_XL) 0.4885 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-27
19 qwen/qwen3.6-35b-a3b (UD-Q6_K_XL) 0.5238 Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-27
20 mistralai/mistral-small-2603 0.5258 Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-27
21 anthropic/claude-opus-4.7, high reasoning 0.5280 Imported 2026-05-27
22 liquidai/lfm2-24b-a2b (Q8_0) 0.5309 Imported 2026-05-27
23 moonshotai/kimi-k2.5 0.5374 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-27