WildBench
Challenging real-user instruction-following benchmark reporting WildBench scores, task-category scores, Elo, and comparison metrics.
61rows
wildbench_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
WildBench Score, Adjusted Score, Task Macro Score, WildBench Elo, Arena-Hard v0.1, AlpacaEval 2.0 LC
| Rank | Subject | WildBench Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Athene-70B | 7.970645792563601 | — | Imported | 2026-05-27 |
| 2 | gpt-4o-2024-05-13 | 7.940371456500489 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 3 | gpt-4o-mini-2024-07-18 | 7.86328125 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-27 |
| 4 | gpt-4-turbo-2024-04-09 | 7.804496578690127 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-27 |
| 5 | Mistral-Large-2 | 7.7900390625 | — | Imported | 2026-05-27 |
| 6 | yi-large-preview | 7.741935483870968 | — | Imported | 2026-05-27 |
| 7 | claude-3-5-sonnet-20240620 | 7.7265625 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 8 | gemma-2-9b-it-DPO | 7.712890625 | — | Imported | 2026-05-27 |
| 9 | gemma-2-9b-it-SimPO | 7.703812316715543 | — | Imported | 2026-05-27 |
| 10 | deepseek-v2-chat-0628 | 7.6904296875 | — | Imported | 2026-05-27 |
| 11 | gpt-4-0125-preview | 7.6640625 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 12 | claude-3-opus-20240229 | 7.60546875 | — | Imported | 2026-05-27 |
| 13 | deepseekv2-chat | 7.502443792766374 | — | Imported | 2026-05-27 |
| 14 | Meta-Llama-3-70B-Instruct | 7.478983382209188 | — | Imported | 2026-05-27 |
| 15 | gemma-2-27b-it@together | 7.4697265625 | — | Imported | 2026-05-27 |
| 16 | yi-large | 7.446725317693059 | — | Imported | 2026-05-27 |
| 17 | deepseek-coder-v2 | 7.4447702834799605 | — | Imported | 2026-05-27 |
| 18 | nemotron-4-340b-instruct | 7.4423828125 | — | Imported | 2026-05-27 |
| 19 | gemini-1.5-pro | 7.369140625 | — | Imported | 2026-05-27 |
| 20 | Yi-1.5-34B-Chat | 7.367546432062561 | — | Imported | 2026-05-27 |
| 21 | Mistral-Nemo-Instruct-2407 | 7.343108504398827 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-27 |
| 22 | Qwen2-72B-Instruct | 7.3203125 | — | Imported | 2026-05-27 |
| 23 | gemma-2-9b-it | 7.268101761252447 | — | Imported | 2026-05-27 |
| 24 | claude-3-sonnet-20240229 | 7.262230919765166 | — | Imported | 2026-05-27 |
| 25 | gemini-1.5-flash | 7.2074363992172215 | — | Imported | 2026-05-27 |
| 26 | Qwen1.5-72B-Chat-greedy | 7.173359451518119 | — | Imported | 2026-05-27 |
| 27 | deepseek-v2-coder-0628 | 7.171875 | — | Imported | 2026-05-27 |
| 28 | Llama-3-8B-Magpie-Align-v0.1 | 7.1223091976516635 | — | Imported | 2026-05-27 |
| 29 | mistral-large-2402 | 7.114369501466276 | — | Imported | 2026-05-27 |
| 30 | command-r-plus | 7.078277886497065 | — | Imported | 2026-05-27 |
| 31 | Llama-3-Instruct-8B-SimPO-v0.2 | 7.075268817204301 | — | Imported | 2026-05-27 |
| 32 | Llama-3-Instruct-8B-SimPO | 7.058651026392962 | — | Imported | 2026-05-27 |
| 33 | glm-4-9b-chat | 7.058651026392962 | — | Imported | 2026-05-27 |
| 34 | reka-core-20240501 | 7.0517578125 | — | Imported | 2026-05-27 |
| 35 | claude-3-haiku-20240307 | 7.0126953125 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-27 |
| 36 | SELM-Llama-3-8B-Instruct-iter-3 | 6.9980392156862745 | — | Imported | 2026-05-27 |
| 37 | Yi-1.5-9B-Chat | 6.992179863147605 | — | Imported | 2026-05-27 |
| 38 | Llama-3-Instruct-8B-SimPO-ExPO | 6.98435972629521 | — | Imported | 2026-05-27 |
| 39 | dbrx-instruct@together | 6.777126099706745 | — | Imported | 2026-05-27 |
| 40 | command-r | 6.7529296875 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-27 |
| 41 | Mixtral-8x7B-Instruct-v0.1 | 6.75146771037182 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-27 |
| 42 | Starling-LM-7B-beta-ExPO | 6.750733137829912 | — | Imported | 2026-05-27 |
| 43 | reka-flash-20240226 | 6.730205278592376 | — | Imported | 2026-05-27 |
| 44 | Starling-LM-7B-beta | 6.70869990224829 | — | Imported | 2026-05-27 |
| 45 | Nous-Hermes-2-Mixtral-8x7B-DPO | 6.6611165523996085 | — | Imported | 2026-05-27 |
| 46 | Meta-Llama-3-8B-Instruct | 6.658846529814272 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-27 |
| 47 | Hermes-2-Theta-Llama-3-8B | 6.64711632453568 | — | Imported | 2026-05-27 |
| 48 | tulu-2-dpo-70b | 6.6412512218963835 | — | Imported | 2026-05-27 |
| 49 | gemma-2-2b-it | 6.636007827788649 | — | Imported | 2026-05-27 |
| 50 | gpt-3.5-turbo-0125 | 6.613880742913001 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 51 | SELM-Zephyr-7B-iter-3 | 6.576171875 | — | Imported | 2026-05-27 |
| 52 | Mistral-7B-Instruct-v0.2 | 6.534701857282503 | — | Imported | 2026-05-27 |
| 53 | neo_7b_instruct_v0.1 | 6.4599609375 | — | Imported | 2026-05-27 |
| 54 | neo_7b_instruct_v0.1-ExPO | 6.381231671554252 | — | Imported | 2026-05-27 |
| 55 | Qwen1.5-7B-Chat@together | 6.36852394916911 | — | Imported | 2026-05-27 |
| 56 | Llama-2-70b-chat-hf | 6.345703125 | — | Imported | 2026-05-27 |
| 57 | Yi-1.5-6B-Chat | 6.263929618768328 | — | Imported | 2026-05-27 |
| 58 | reka-edge | 6.159335288367546 | Reka Edge rekaai-reka-edge | Imported | 2026-05-27 |
| 59 | Llama-2-7b-chat-hf | 5.761252446183953 | — | Imported | 2026-05-27 |
| 60 | gemma-7b-it | 5.5087890625 | — | Imported | 2026-05-27 |
| 61 | gemma-2b-it | 4.737512242899118 | — | Imported | 2026-05-27 |
No matching rows.