URIAL Bench
URIAL Bench evaluates base language models prompted with Untuned LLMs with Restyled In-context ALignment on MT-Bench-style multi-turn tasks.
19rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, Turn 1, Turn 2, Coding, Extraction, Humanities, Math, Reasoning, Roleplay, STEM, Writing
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4 | 8.99 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 2 | gpt-3.5-turbo | 7.94 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 3 | dbrx | 7.22 | — | Imported | 2026-05-06 |
| 4 | Llama-2-70b-hf | 7.11 | — | Imported | 2026-05-06 |
| 5 | Mixtral-8x7B-v0.1 | 6.94 | — | Imported | 2026-05-06 |
| 6 | Mistral-7b-v0.1 | 6.67 | — | Imported | 2026-05-06 |
| 7 | Yi-34B | 6.67 | — | Imported | 2026-05-06 |
| 8 | phi-2-vllm | 6.06 | — | Imported | 2026-05-06 |
| 9 | gemma-7b | 6.00 | — | Imported | 2026-05-06 |
| 10 | phi-2 | 5.85 | — | Imported | 2026-05-06 |
| 11 | Llama-2-13b-hf | 5.34 | — | Imported | 2026-05-06 |
| 12 | Yi-6B | 4.97 | — | Imported | 2026-05-06 |
| 13 | Llama-2-7b-hf | 4.83 | — | Imported | 2026-05-06 |
| 14 | gemma-2b | 3.97 | — | Imported | 2026-05-06 |
| 15 | olmo | 3.41 | — | Imported | 2026-05-06 |
| 16 | olmo-7b-vllm | 3.38 | — | Imported | 2026-05-06 |
| 17 | falcon-7b | 3.10 | — | Imported | 2026-05-06 |
| 18 | mpt-7b | 1.49 | — | Imported | 2026-05-06 |
| 19 | amber | 1.44 | — | Imported | 2026-05-06 |
No matching rows.