RP-Bench

Roleplay quality benchmark evaluating LLMs on character consistency, user agency, lorebook integration, temporal reasoning, genre craft, and community preference.

30rows
community_eloprimary metric
2026-05-06sampled

Metadata

Metrics

Community ELO, Community Overall Winrate, Community SFW Winrate, Community NSFW Winrate, Judge ELO, Judge ELO Winrate, Judge Overall Score, Tier 1 Fundamentals, Tier 2 Quality Control, Tier 3 Genre Craft

Latest Results

Rows are parsed from public Hugging Face dataset-server configs. The community arena, LLM-judge ELO, and judged score leaderboards are preserved as separate result rows using source_config metadata.

Rank Subject Community ELO Model Match Provenance Sampled
1 claude_opus_4_6 1705.70 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
2 deepseek_v3_2 1638.30 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
3 gemma_4_26b 1546.10 Imported 2026-05-06
4 claude_sonnet_4_5 1541.10 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
5 gemini_2_5_flash 1539 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
6 gpt_4_1 1522.70 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
7 mistral_small_creative 1516.70 Imported 2026-05-06
8 gpt_4_1 1509.40 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
9 grok_4_1 1507.10 Imported 2026-05-06
10 claude_sonnet_4_5 1497.30 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
11 qwen3_5_flash 1496.20 Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-06
12 glm_4_7 1491.90 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
13 glm_4_7 1482.80 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
14 deepseek_v3_2 1478.80 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
15 minimax_m2_7 1473.20 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
16 llama_4_maverick 1453.40 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
17 gemini_2_5_flash 1407.80 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
18 mistral_small_creative 1360 Imported 2026-05-06
19 qwen3_5_flash 1332.50 Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-06
20 claude_sonnet_4_5 4.37 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
21 deepseek_v3_2 4.34 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
22 glm_4_7 4.34 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
23 grok_4_1 4.32 Imported 2026-05-06
24 gpt_4_1 4.31 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
25 mistral_small_creative 4.31 Imported 2026-05-06
26 gemma_4_26b 4.27 Imported 2026-05-06
27 qwen3_5_flash 4.27 Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-06
28 gemini_2_5_flash 4.21 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
29 minimax_m2_7 4.20 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
30 llama_4_maverick 3.92 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06