URIAL Bench | BenchmarkList

Metadata

Overall, Turn 1, Turn 2, Coding, Extraction, Humanities, Math, Reasoning, Roleplay, STEM, Writing

Rank	Subject	Overall	Model Match	Provenance	Sampled
1	gpt-4	8.99	GPT-4 openai-gpt-4	Imported	2026-05-06
2	gpt-3.5-turbo	7.94	GPT-3.5 Turbo openai-gpt-3.5-turbo	Imported	2026-05-06
3	dbrx	7.22	—	Imported	2026-05-06
4	Llama-2-70b-hf	7.11	—	Imported	2026-05-06
5	Mixtral-8x7B-v0.1	6.94	—	Imported	2026-05-06
6	Mistral-7b-v0.1	6.67	—	Imported	2026-05-06
7	Yi-34B	6.67	—	Imported	2026-05-06
8	phi-2-vllm	6.06	—	Imported	2026-05-06
9	gemma-7b	6.00	—	Imported	2026-05-06
10	phi-2	5.85	—	Imported	2026-05-06
11	Llama-2-13b-hf	5.34	—	Imported	2026-05-06
12	Yi-6B	4.97	—	Imported	2026-05-06
13	Llama-2-7b-hf	4.83	—	Imported	2026-05-06
14	gemma-2b	3.97	—	Imported	2026-05-06
15	olmo	3.41	—	Imported	2026-05-06
16	olmo-7b-vllm	3.38	—	Imported	2026-05-06
17	falcon-7b	3.10	—	Imported	2026-05-06
18	mpt-7b	1.49	—	Imported	2026-05-06
19	amber	1.44	—	Imported	2026-05-06