RP-Bench
Roleplay quality benchmark evaluating LLMs on character consistency, user agency, lorebook integration, temporal reasoning, genre craft, and community preference.
30rows
community_eloprimary metric
2026-05-06sampled
Metadata
Metrics
Community ELO, Community Overall Winrate, Community SFW Winrate, Community NSFW Winrate, Judge ELO, Judge ELO Winrate, Judge Overall Score, Tier 1 Fundamentals, Tier 2 Quality Control, Tier 3 Genre Craft
| Rank | Subject | Community ELO | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude_opus_4_6 | 1705.70 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 2 | deepseek_v3_2 | 1638.30 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 3 | gemma_4_26b | 1546.10 | — | Imported | 2026-05-06 |
| 4 | claude_sonnet_4_5 | 1541.10 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 5 | gemini_2_5_flash | 1539 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 6 | gpt_4_1 | 1522.70 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 7 | mistral_small_creative | 1516.70 | — | Imported | 2026-05-06 |
| 8 | gpt_4_1 | 1509.40 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 9 | grok_4_1 | 1507.10 | — | Imported | 2026-05-06 |
| 10 | claude_sonnet_4_5 | 1497.30 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 11 | qwen3_5_flash | 1496.20 | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-06 |
| 12 | glm_4_7 | 1491.90 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 13 | glm_4_7 | 1482.80 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 14 | deepseek_v3_2 | 1478.80 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 15 | minimax_m2_7 | 1473.20 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 16 | llama_4_maverick | 1453.40 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 17 | gemini_2_5_flash | 1407.80 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 18 | mistral_small_creative | 1360 | — | Imported | 2026-05-06 |
| 19 | qwen3_5_flash | 1332.50 | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-06 |
| 20 | claude_sonnet_4_5 | 4.37 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 21 | deepseek_v3_2 | 4.34 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 22 | glm_4_7 | 4.34 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 23 | grok_4_1 | 4.32 | — | Imported | 2026-05-06 |
| 24 | gpt_4_1 | 4.31 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 25 | mistral_small_creative | 4.31 | — | Imported | 2026-05-06 |
| 26 | gemma_4_26b | 4.27 | — | Imported | 2026-05-06 |
| 27 | qwen3_5_flash | 4.27 | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-06 |
| 28 | gemini_2_5_flash | 4.21 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 29 | minimax_m2_7 | 4.20 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 30 | llama_4_maverick | 3.92 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
No matching rows.