ConStory-Bench
Long story generation benchmark measuring cross-scene consistency bugs using Consistency Error Density over 2,000 generated stories.
33rows
cedprimary metric
2026-05-28sampled
Metadata
Metrics
Consistency Error Density (lower is better), Average Words (lower is better), Total Stories
| Rank | Subject | Consistency Error Density | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5-Reasoning | CED 0.113 | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 2 | Gemini-2.5-Pro | CED 0.302 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 3 | Gemini-2.5-Flash | CED 0.305 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 4 | Claude-Sonnet-4.5 | CED 0.52 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 5 | GLM-4.6 | CED 0.528 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-28 |
| 6 | Qwen3-32B | CED 0.537 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-28 |
| 7 | Ring-1T | CED 0.539 | — | Imported | 2026-05-28 |
| 8 | DeepSeek-V3.2-Exp | CED 0.541 | DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp | Imported | 2026-05-28 |
| 9 | Qwen3-235B-A22B-Thinking | CED 0.559 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Imported | 2026-05-28 |
| 10 | GLM-4.5 | CED 0.595 | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-28 |
| 11 | LongWriter-Zero-32B | CED 0.669 | — | Imported | 2026-05-28 |
| 12 | Grok-4 | CED 0.67 | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 13 | SuperWriter | CED 0.674 | — | Imported | 2026-05-28 |
| 14 | Ling-1T | CED 0.699 | — | Imported | 2026-05-28 |
| 15 | GPT-4o-1120 | CED 0.711 | GPT-4o openai-gpt-4o | Imported | 2026-05-28 |
| 16 | Step3 | CED 0.845 | — | Imported | 2026-05-28 |
| 17 | Qwen3-Next-80B-Thinking | CED 0.959 | — | Imported | 2026-05-28 |
| 18 | DOME | CED 1.033 | — | Imported | 2026-05-28 |
| 19 | Doubao-1.6-Thinking-2507 | CED 1.217 | — | Imported | 2026-05-28 |
| 20 | Kimi-K2-2509 | CED 1.3 | — | Imported | 2026-05-28 |
| 21 | Kimi-K2-2507 | CED 1.33 | — | Imported | 2026-05-28 |
| 22 | Mistral-Medium-3.1 | CED 1.355 | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-28 |
| 23 | Qwen3-235B-A22B | CED 1.447 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-28 |
| 24 | Qwen3-Next-80B | CED 1.603 | — | Imported | 2026-05-28 |
| 25 | Qwen3-4B-Instruct-2507 | CED 1.685 | — | Imported | 2026-05-28 |
| 26 | Nvidia-Llama-3.1-Ultra | CED 1.833 | — | Imported | 2026-05-28 |
| 27 | Qwen3-30B-A3B-Instruct-2507 | CED 2.13 | Qwen3 30B A3B Instruct 2507 qwen-qwen3-30b-a3b-instruct-2507 | Imported | 2026-05-28 |
| 28 | DeepSeek-V3 | CED 2.422 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-28 |
| 29 | Suri-ORPO | CED 2.445 | — | Imported | 2026-05-28 |
| 30 | QwenLong-L1-32B | CED 3.413 | — | Imported | 2026-05-28 |
| 31 | DeepSeek-R1 | CED 3.419 | R1 deepseek-r1 | Imported | 2026-05-28 |
| 32 | MiniMax-M1-80k | CED 3.447 | — | Imported | 2026-05-28 |
| 33 | LongAlign-13B | CED 3.664 | — | Imported | 2026-05-28 |
No matching rows.