MixEval Chat

MixEval Chat reports chat-model results for MixEval and MixEval-Hard, dynamic benchmark mixtures designed to approximate real-world user-facing LLM capability with strong correlation to Chatbot Arena.

52rows
leaderboard_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Leaderboard Score, MixEval-Hard, MixEval, Arena Elo (0527), TriviaQA (Mixed), MMLU (Mixed), DROP (Mixed), HellaSwag (Mixed), CommonsenseQA (Mixed), TriviaQA-Hard (Mixed), MMLU-Hard (Mixed), DROP-Hard (Mixed)

Latest Results

Rank Subject Leaderboard Score Model Match Provenance Sampled
1 OpenAI o1-preview 72 o1-preview
openai-o1-preview
Imported 2026-05-06
2 Claude 3.5 Sonnet-0620 68.10 Imported 2026-05-06
3 LLaMA-3.1-405B-Instruct 66.20 Imported 2026-05-06
4 GPT-4o-2024-05-13 64.70 GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-06
5 Claude 3 Opus 63.50 Imported 2026-05-06
6 GPT-4-Turbo-2024-04-09 62.60 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
7 Gemini 1.5 Pro-API-0409 58.70 Imported 2026-05-06
8 Gemini 1.5 Pro-API-0514 58.30 Imported 2026-05-06
9 Mistral Large 2 57.40 Imported 2026-05-06
10 Spark4.0 57 Imported 2026-05-06
11 Yi-Large-preview 56.80 Imported 2026-05-06
12 LLaMA-3-70B-Instruct 55.90 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-06
13 Qwen-Max-0428 55.80 Imported 2026-05-06
14 Claude 3 Sonnet 54 Imported 2026-05-06
15 Reka Core-20240415 52.90 Imported 2026-05-06
16 MAmmoTH2-8x7B-Plus 51.80 Imported 2026-05-06
17 DeepSeek-V2 51.70 Imported 2026-05-06
18 GPT-4o mini 51.60 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
19 Command R+ 51.40 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-06
20 Yi-1.5-34B-Chat 51.20 Imported 2026-05-06
21 Mistral-Large 50.30 Mistral Large
mistralai-mistral-large
Imported 2026-05-06
22 Qwen1.5-72B-Chat 48.30 Imported 2026-05-06
23 Mistral-Medium 47.80 Imported 2026-05-06
24 Gemini 1.0 Pro 46.40 Imported 2026-05-06
25 Mistral-Small 46.20 Imported 2026-05-06
26 Reka Flash-20240226 46.20 Imported 2026-05-06
27 LLaMA-3-8B-Instruct 45.60 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
28 Command R 45.20 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-06
29 Qwen1.5-32B-Chat 43.30 Imported 2026-05-06
30 GPT-3.5-Turbo-0125 43 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
31 Claude 3 Haiku 42.80 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-06
32 Yi-34B-Chat 42.60 Imported 2026-05-06
33 Mixtral-8x7B-Instruct-v0.1 42.50 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
34 Starling-LM-7B-beta 41.80 Imported 2026-05-06
35 Yi-1.5-9B-Chat 40.90 Imported 2026-05-06
36 Gemma-1.1-7B-IT 39.10 Imported 2026-05-06
37 Vicuna-33B-v1.3 38.70 Imported 2026-05-06
38 LLaMA-2-70B-Chat 38 Imported 2026-05-06
39 MAP-Neo-Instruct-v0.1 37.80 Imported 2026-05-06
40 Mistral-7B-Instruct-v0.2 36.20 Imported 2026-05-06
41 Qwen1.5-7B-Chat 35.50 Imported 2026-05-06
42 Reka Edge-20240208 32.20 Imported 2026-05-06
43 Zephyr-7B-β 31.60 Imported 2026-05-06
44 LLaMA-2-7B-Chat 30.80 Imported 2026-05-06
45 Yi-6B-Chat 30.10 Imported 2026-05-06
46 Qwen1.5-MoE-A2.7B-Chat 29.10 Imported 2026-05-06
47 Gemma-1.1-2B-IT 28.40 Imported 2026-05-06
48 Vicuna-7B-v1.5 27.80 Imported 2026-05-06
49 OLMo-7B-Instruct 26.70 Imported 2026-05-06
50 Qwen1.5-4B-Chat 24.60 Imported 2026-05-06
51 JetMoE-8B-Chat 24.30 Imported 2026-05-06
52 MPT-7B-Chat 23.80 Imported 2026-05-06