Creative Writing v3
EQ-Bench Creative Writing v3 is an LLM-judged creative writing benchmark that evaluates models across 32 writing prompts with 3 iterations per prompt. Uses a hybrid scoring system combining rubric assessment and Elo ratings through pairwise comparisons. Challenges models in areas like humor, romance, spatial awareness, and unique perspectives to assess emotional intelligence and creative writing capabilities.
13rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Grok-4.1 Thinking | 1721.90 | — | Self-reported | 2026-05-06 |
| 2 | Grok-4.1 | 1708.60 | — | Self-reported | 2026-05-06 |
| 3 | Qwen3-235B-A22B-Instruct-2507 | 0.88 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Self-reported | 2026-05-06 |
| 4 | Qwen3 VL 235B A22B Instruct | 0.86 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Self-reported | 2026-05-06 |
| 5 | Qwen3-235B-A22B-Thinking-2507 | 0.86 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Self-reported | 2026-05-06 |
| 6 | Qwen3 VL 235B A22B Thinking | 0.86 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Self-reported | 2026-05-06 |
| 7 | Qwen3 VL 32B Instruct | 0.86 | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Self-reported | 2026-05-06 |
| 8 | Qwen3-Next-80B-A3B-Instruct | 0.85 | Qwen3 Next 80B A3B Instruct qwen-qwen3-next-80b-a3b-instruct | Self-reported | 2026-05-06 |
| 9 | Qwen3 VL 30B A3B Instruct | 0.85 | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Self-reported | 2026-05-06 |
| 10 | Qwen3 VL 32B Thinking | 0.83 | — | Self-reported | 2026-05-06 |
| 11 | Qwen3 VL 30B A3B Thinking | 0.82 | Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking | Self-reported | 2026-05-06 |
| 12 | Qwen3 VL 8B Thinking | 0.82 | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Self-reported | 2026-05-06 |
| 13 | Qwen3 VL 4B Thinking | 0.76 | — | Self-reported | 2026-05-06 |
No matching rows.