Stick To Your Role!

Leaderboard benchmarking LLM stability in simulated populations and roleplay settings, with ordinal, cardinal, rank-order stability, and structural fit metrics.

32rows
cardinal_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Ordinal win rate, Cardinal score, Rank-order stability, Stress (lower is better), CFI, SRMR (lower is better), RMSEA (lower is better)

Latest Results

Snapshot mirrors the public Stick To Your Role leaderboard CSV. Source display names and structural-fit stability metrics are preserved.

Rank Subject Cardinal score Model Match Provenance Sampled
1 Qwen2.5-VL-72B-Instruct 0.84 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-06
2 Llama-3.1-Nemotron-70B-Instruct 0.81 Llama 3.1 Nemotron 70B Instruct
nvidia-llama-3.1-nemotron-70b-instruct
Imported 2026-05-06
3 Mistral-Large-Instruct-2407 0.79 Imported 2026-05-06
4 Llama-3.3-70B-Instruct 0.78 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
5 Dracarys2-72B-Instruct 0.77 Imported 2026-05-06
6 QwQ-32B 0.77 Imported 2026-05-06
7 Llama-3.1-70B-Instruct 0.77 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-06
8 Qwen3-32B-A3B 0.75 Imported 2026-05-06
9 Qwen3-32B 0.74 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-06
10 Mistral-Large-Instruct-2411 0.73 Imported 2026-05-06
11 Nautilus-70B-v0.1 0.72 Imported 2026-05-06
12 Qwen3-235B-A22B-FP8 0.72 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
13 Qwen3-8B 0.72 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-06
14 Mistral-Small-3.1-24B-Instruct-2503 0.70 Mistral: Mistral Small 3.1 24B
mistralai-mistral-small-3.1-24b-instruct
Imported 2026-05-06
15 Qwen2.5-14B-Instruct-1M 0.70 Imported 2026-05-06
16 Qwen3-4B 0.70 Imported 2026-05-06
17 Llama-3.1-8B-Instruct 0.62 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-06
18 Llama-4-Scout-17B-16E-Instruct 0.62 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
19 GLM-4-32B-0414 0.59 GLM GLM 4 32B
z-ai-glm-4-32b
Imported 2026-05-06
20 Cydonia-22B-v1.2 0.57 Imported 2026-05-06
21 Mistral-Nemo-Instruct-2407 0.52 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-06
22 Llama-3.2-3B-Instruct 0.49 Llama 3.2 3B Instruct
meta-llama-llama-3.2-3b-instruct
Imported 2026-05-06
23 Ministrations-8B-v1 0.49 Imported 2026-05-06
24 reka-flash-3 0.49 REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-06
25 Qwen2.5-VL-7B-Instruct 0.42 Imported 2026-05-06
26 Mixtral-8x7B-Instruct-v0.1 0.42 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
27 phi-4 0.32 Phi 4
microsoft-phi-4
Imported 2026-05-06
28 Llama-3.2-1B-Instruct 0.28 Llama 3.2 1B Instruct
meta-llama-llama-3.2-1b-instruct
Imported 2026-05-06
29 Qwen2.5-VL-3B-Instruct 0.27 Imported 2026-05-06
30 Mistral-7B-Instruct-v0.2 0.23 Imported 2026-05-06
31 dummy 0.23 Imported 2026-05-06
32 phi-3-medium-128k-instruct 0.22 Imported 2026-05-06