Altered Riddles
Reasoning benchmark for conditioned override, testing whether models fall back to memorized answers when familiar riddles are deliberately modified with constraints, context swaps, meaning shifts, or bias probes.
23rows
conditioned_override_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Conditioned Override Rate (lower is better), Altered accuracy, Original accuracy, Pattern override rate (lower is better), Average output tokens per riddle (lower is better)
| Rank | Subject | Conditioned Override Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | xiaomi/mimo-v2-pro, high reasoning | 0.2873 | — | Imported | 2026-05-27 |
| 2 | openai/gpt-oss-20b (MXFP4), high reasoning | 0.2896 | — | Imported | 2026-05-27 |
| 3 | openai/gpt-5.4-mini, high reasoning | 0.3058 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-27 |
| 4 | minimaxai/minimax-m2.7 (FP4), high reasoning | 0.3231 | — | Imported | 2026-05-27 |
| 5 | zai-org/glm-5.1 (FP4), high reasoning | 0.3239 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-27 |
| 6 | openai/gpt-oss-120b (MXFP4), high reasoning | 0.3316 | — | Imported | 2026-05-27 |
| 7 | xiaomi/mimo-v2-omni, high reasoning | 0.3366 | — | Imported | 2026-05-27 |
| 8 | zai-org/glm-5 (FP4), high reasoning | 0.3460 | GLM 5 z-ai-glm-5 | Imported | 2026-05-27 |
| 9 | mistralai/mistral-small-2603, high reasoning | 0.3487 | — | Imported | 2026-05-27 |
| 10 | google/gemma-4-31b-it (FP8), high reasoning | 0.3619 | — | Imported | 2026-05-27 |
| 11 | openai/gpt-5.4-mini | 0.4010 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-27 |
| 12 | qwen/qwen3.5-27b (UD-Q4_K_XL) | 0.4234 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-27 |
| 13 | moonshotai/kimi-k2.5, high reasoning | 0.4319 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-27 |
| 14 | moonshotai/kimi-k2.6, high reasoning | 0.4413 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-27 |
| 15 | google/gemma-4-31b-it (UD-Q6_K_XL) | 0.4615 | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-27 |
| 16 | qwen/qwen3.6-27b (UD-Q6_K_XL) | 0.4751 | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-27 |
| 17 | anthropic/claude-sonnet-4.6, high reasoning | 0.4837 | — | Imported | 2026-05-27 |
| 18 | google/gemma-4-26b-a4b-it (UD-Q6_K_XL) | 0.4885 | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-27 |
| 19 | qwen/qwen3.6-35b-a3b (UD-Q6_K_XL) | 0.5238 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-27 |
| 20 | mistralai/mistral-small-2603 | 0.5258 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-27 |
| 21 | anthropic/claude-opus-4.7, high reasoning | 0.5280 | — | Imported | 2026-05-27 |
| 22 | liquidai/lfm2-24b-a2b (Q8_0) | 0.5309 | — | Imported | 2026-05-27 |
| 23 | moonshotai/kimi-k2.5 | 0.5374 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-27 |
No matching rows.