MixEval Chat
MixEval Chat reports chat-model results for MixEval and MixEval-Hard, dynamic benchmark mixtures designed to approximate real-world user-facing LLM capability with strong correlation to Chatbot Arena.
52rows
leaderboard_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Leaderboard Score, MixEval-Hard, MixEval, Arena Elo (0527), TriviaQA (Mixed), MMLU (Mixed), DROP (Mixed), HellaSwag (Mixed), CommonsenseQA (Mixed), TriviaQA-Hard (Mixed), MMLU-Hard (Mixed), DROP-Hard (Mixed)
| Rank | Subject | Leaderboard Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | OpenAI o1-preview | 72 | o1-preview openai-o1-preview | Imported | 2026-05-06 |
| 2 | Claude 3.5 Sonnet-0620 | 68.10 | — | Imported | 2026-05-06 |
| 3 | LLaMA-3.1-405B-Instruct | 66.20 | — | Imported | 2026-05-06 |
| 4 | GPT-4o-2024-05-13 | 64.70 | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-06 |
| 5 | Claude 3 Opus | 63.50 | — | Imported | 2026-05-06 |
| 6 | GPT-4-Turbo-2024-04-09 | 62.60 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 7 | Gemini 1.5 Pro-API-0409 | 58.70 | — | Imported | 2026-05-06 |
| 8 | Gemini 1.5 Pro-API-0514 | 58.30 | — | Imported | 2026-05-06 |
| 9 | Mistral Large 2 | 57.40 | — | Imported | 2026-05-06 |
| 10 | Spark4.0 | 57 | — | Imported | 2026-05-06 |
| 11 | Yi-Large-preview | 56.80 | — | Imported | 2026-05-06 |
| 12 | LLaMA-3-70B-Instruct | 55.90 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 13 | Qwen-Max-0428 | 55.80 | — | Imported | 2026-05-06 |
| 14 | Claude 3 Sonnet | 54 | — | Imported | 2026-05-06 |
| 15 | Reka Core-20240415 | 52.90 | — | Imported | 2026-05-06 |
| 16 | MAmmoTH2-8x7B-Plus | 51.80 | — | Imported | 2026-05-06 |
| 17 | DeepSeek-V2 | 51.70 | — | Imported | 2026-05-06 |
| 18 | GPT-4o mini | 51.60 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 19 | Command R+ | 51.40 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-06 |
| 20 | Yi-1.5-34B-Chat | 51.20 | — | Imported | 2026-05-06 |
| 21 | Mistral-Large | 50.30 | Mistral Large mistralai-mistral-large | Imported | 2026-05-06 |
| 22 | Qwen1.5-72B-Chat | 48.30 | — | Imported | 2026-05-06 |
| 23 | Mistral-Medium | 47.80 | — | Imported | 2026-05-06 |
| 24 | Gemini 1.0 Pro | 46.40 | — | Imported | 2026-05-06 |
| 25 | Mistral-Small | 46.20 | — | Imported | 2026-05-06 |
| 26 | Reka Flash-20240226 | 46.20 | — | Imported | 2026-05-06 |
| 27 | LLaMA-3-8B-Instruct | 45.60 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 28 | Command R | 45.20 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-06 |
| 29 | Qwen1.5-32B-Chat | 43.30 | — | Imported | 2026-05-06 |
| 30 | GPT-3.5-Turbo-0125 | 43 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 31 | Claude 3 Haiku | 42.80 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 32 | Yi-34B-Chat | 42.60 | — | Imported | 2026-05-06 |
| 33 | Mixtral-8x7B-Instruct-v0.1 | 42.50 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 34 | Starling-LM-7B-beta | 41.80 | — | Imported | 2026-05-06 |
| 35 | Yi-1.5-9B-Chat | 40.90 | — | Imported | 2026-05-06 |
| 36 | Gemma-1.1-7B-IT | 39.10 | — | Imported | 2026-05-06 |
| 37 | Vicuna-33B-v1.3 | 38.70 | — | Imported | 2026-05-06 |
| 38 | LLaMA-2-70B-Chat | 38 | — | Imported | 2026-05-06 |
| 39 | MAP-Neo-Instruct-v0.1 | 37.80 | — | Imported | 2026-05-06 |
| 40 | Mistral-7B-Instruct-v0.2 | 36.20 | — | Imported | 2026-05-06 |
| 41 | Qwen1.5-7B-Chat | 35.50 | — | Imported | 2026-05-06 |
| 42 | Reka Edge-20240208 | 32.20 | — | Imported | 2026-05-06 |
| 43 | Zephyr-7B-β | 31.60 | — | Imported | 2026-05-06 |
| 44 | LLaMA-2-7B-Chat | 30.80 | — | Imported | 2026-05-06 |
| 45 | Yi-6B-Chat | 30.10 | — | Imported | 2026-05-06 |
| 46 | Qwen1.5-MoE-A2.7B-Chat | 29.10 | — | Imported | 2026-05-06 |
| 47 | Gemma-1.1-2B-IT | 28.40 | — | Imported | 2026-05-06 |
| 48 | Vicuna-7B-v1.5 | 27.80 | — | Imported | 2026-05-06 |
| 49 | OLMo-7B-Instruct | 26.70 | — | Imported | 2026-05-06 |
| 50 | Qwen1.5-4B-Chat | 24.60 | — | Imported | 2026-05-06 |
| 51 | JetMoE-8B-Chat | 24.30 | — | Imported | 2026-05-06 |
| 52 | MPT-7B-Chat | 23.80 | — | Imported | 2026-05-06 |
No matching rows.