Stick To Your Role!
Leaderboard benchmarking LLM stability in simulated populations and roleplay settings, with ordinal, cardinal, rank-order stability, and structural fit metrics.
32rows
cardinal_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Ordinal win rate, Cardinal score, Rank-order stability, Stress (lower is better), CFI, SRMR (lower is better), RMSEA (lower is better)
| Rank | Subject | Cardinal score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen2.5-VL-72B-Instruct | 0.84 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Imported | 2026-05-06 |
| 2 | Llama-3.1-Nemotron-70B-Instruct | 0.81 | Llama 3.1 Nemotron 70B Instruct nvidia-llama-3.1-nemotron-70b-instruct | Imported | 2026-05-06 |
| 3 | Mistral-Large-Instruct-2407 | 0.79 | — | Imported | 2026-05-06 |
| 4 | Llama-3.3-70B-Instruct | 0.78 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 5 | Dracarys2-72B-Instruct | 0.77 | — | Imported | 2026-05-06 |
| 6 | QwQ-32B | 0.77 | — | Imported | 2026-05-06 |
| 7 | Llama-3.1-70B-Instruct | 0.77 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-06 |
| 8 | Qwen3-32B-A3B | 0.75 | — | Imported | 2026-05-06 |
| 9 | Qwen3-32B | 0.74 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-06 |
| 10 | Mistral-Large-Instruct-2411 | 0.73 | — | Imported | 2026-05-06 |
| 11 | Nautilus-70B-v0.1 | 0.72 | — | Imported | 2026-05-06 |
| 12 | Qwen3-235B-A22B-FP8 | 0.72 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 13 | Qwen3-8B | 0.72 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-06 |
| 14 | Mistral-Small-3.1-24B-Instruct-2503 | 0.70 | Mistral: Mistral Small 3.1 24B mistralai-mistral-small-3.1-24b-instruct | Imported | 2026-05-06 |
| 15 | Qwen2.5-14B-Instruct-1M | 0.70 | — | Imported | 2026-05-06 |
| 16 | Qwen3-4B | 0.70 | — | Imported | 2026-05-06 |
| 17 | Llama-3.1-8B-Instruct | 0.62 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 18 | Llama-4-Scout-17B-16E-Instruct | 0.62 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 19 | GLM-4-32B-0414 | 0.59 | GLM 4 32B z-ai-glm-4-32b | Imported | 2026-05-06 |
| 20 | Cydonia-22B-v1.2 | 0.57 | — | Imported | 2026-05-06 |
| 21 | Mistral-Nemo-Instruct-2407 | 0.52 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-06 |
| 22 | Llama-3.2-3B-Instruct | 0.49 | Llama 3.2 3B Instruct meta-llama-llama-3.2-3b-instruct | Imported | 2026-05-06 |
| 23 | Ministrations-8B-v1 | 0.49 | — | Imported | 2026-05-06 |
| 24 | reka-flash-3 | 0.49 | Reka Flash 3 rekaai-reka-flash-3 | Imported | 2026-05-06 |
| 25 | Qwen2.5-VL-7B-Instruct | 0.42 | — | Imported | 2026-05-06 |
| 26 | Mixtral-8x7B-Instruct-v0.1 | 0.42 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 27 | phi-4 | 0.32 | Phi 4 microsoft-phi-4 | Imported | 2026-05-06 |
| 28 | Llama-3.2-1B-Instruct | 0.28 | Llama 3.2 1B Instruct meta-llama-llama-3.2-1b-instruct | Imported | 2026-05-06 |
| 29 | Qwen2.5-VL-3B-Instruct | 0.27 | — | Imported | 2026-05-06 |
| 30 | Mistral-7B-Instruct-v0.2 | 0.23 | — | Imported | 2026-05-06 |
| 31 | dummy | 0.23 | — | Imported | 2026-05-06 |
| 32 | phi-3-medium-128k-instruct | 0.22 | — | Imported | 2026-05-06 |
No matching rows.