WildBench

Challenging real-user instruction-following benchmark reporting WildBench scores, task-category scores, Elo, and comparison metrics.

61rows
wildbench_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

WildBench Score, Adjusted Score, Task Macro Score, WildBench Elo, Arena-Hard v0.1, AlpacaEval 2.0 LC

Latest Results

Rows are parsed from WildBench public leaderboard JSON files and joined with public Elo/comparison-stat artifacts when available.

Rank Subject WildBench Score Model Match Provenance Sampled
1 Athene-70B 7.970645792563601 Imported 2026-05-27
2 gpt-4o-2024-05-13 7.940371456500489 GPT-4o
openai-gpt-4o
Imported 2026-05-27
3 gpt-4o-mini-2024-07-18 7.86328125 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
4 gpt-4-turbo-2024-04-09 7.804496578690127 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-27
5 Mistral-Large-2 7.7900390625 Imported 2026-05-27
6 yi-large-preview 7.741935483870968 Imported 2026-05-27
7 claude-3-5-sonnet-20240620 7.7265625 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
8 gemma-2-9b-it-DPO 7.712890625 Imported 2026-05-27
9 gemma-2-9b-it-SimPO 7.703812316715543 Imported 2026-05-27
10 deepseek-v2-chat-0628 7.6904296875 Imported 2026-05-27
11 gpt-4-0125-preview 7.6640625 GPT-4
openai-gpt-4
Imported 2026-05-27
12 claude-3-opus-20240229 7.60546875 Imported 2026-05-27
13 deepseekv2-chat 7.502443792766374 Imported 2026-05-27
14 Meta-Llama-3-70B-Instruct 7.478983382209188 Imported 2026-05-27
15 gemma-2-27b-it@together 7.4697265625 Imported 2026-05-27
16 yi-large 7.446725317693059 Imported 2026-05-27
17 deepseek-coder-v2 7.4447702834799605 Imported 2026-05-27
18 nemotron-4-340b-instruct 7.4423828125 Imported 2026-05-27
19 gemini-1.5-pro 7.369140625 Imported 2026-05-27
20 Yi-1.5-34B-Chat 7.367546432062561 Imported 2026-05-27
21 Mistral-Nemo-Instruct-2407 7.343108504398827 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-27
22 Qwen2-72B-Instruct 7.3203125 Imported 2026-05-27
23 gemma-2-9b-it 7.268101761252447 Imported 2026-05-27
24 claude-3-sonnet-20240229 7.262230919765166 Imported 2026-05-27
25 gemini-1.5-flash 7.2074363992172215 Imported 2026-05-27
26 Qwen1.5-72B-Chat-greedy 7.173359451518119 Imported 2026-05-27
27 deepseek-v2-coder-0628 7.171875 Imported 2026-05-27
28 Llama-3-8B-Magpie-Align-v0.1 7.1223091976516635 Imported 2026-05-27
29 mistral-large-2402 7.114369501466276 Imported 2026-05-27
30 command-r-plus 7.078277886497065 Imported 2026-05-27
31 Llama-3-Instruct-8B-SimPO-v0.2 7.075268817204301 Imported 2026-05-27
32 Llama-3-Instruct-8B-SimPO 7.058651026392962 Imported 2026-05-27
33 glm-4-9b-chat 7.058651026392962 Imported 2026-05-27
34 reka-core-20240501 7.0517578125 Imported 2026-05-27
35 claude-3-haiku-20240307 7.0126953125 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-27
36 SELM-Llama-3-8B-Instruct-iter-3 6.9980392156862745 Imported 2026-05-27
37 Yi-1.5-9B-Chat 6.992179863147605 Imported 2026-05-27
38 Llama-3-Instruct-8B-SimPO-ExPO 6.98435972629521 Imported 2026-05-27
39 dbrx-instruct@together 6.777126099706745 Imported 2026-05-27
40 command-r 6.7529296875 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-27
41 Mixtral-8x7B-Instruct-v0.1 6.75146771037182 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-27
42 Starling-LM-7B-beta-ExPO 6.750733137829912 Imported 2026-05-27
43 reka-flash-20240226 6.730205278592376 Imported 2026-05-27
44 Starling-LM-7B-beta 6.70869990224829 Imported 2026-05-27
45 Nous-Hermes-2-Mixtral-8x7B-DPO 6.6611165523996085 Imported 2026-05-27
46 Meta-Llama-3-8B-Instruct 6.658846529814272 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-27
47 Hermes-2-Theta-Llama-3-8B 6.64711632453568 Imported 2026-05-27
48 tulu-2-dpo-70b 6.6412512218963835 Imported 2026-05-27
49 gemma-2-2b-it 6.636007827788649 Imported 2026-05-27
50 gpt-3.5-turbo-0125 6.613880742913001 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
51 SELM-Zephyr-7B-iter-3 6.576171875 Imported 2026-05-27
52 Mistral-7B-Instruct-v0.2 6.534701857282503 Imported 2026-05-27
53 neo_7b_instruct_v0.1 6.4599609375 Imported 2026-05-27
54 neo_7b_instruct_v0.1-ExPO 6.381231671554252 Imported 2026-05-27
55 Qwen1.5-7B-Chat@together 6.36852394916911 Imported 2026-05-27
56 Llama-2-70b-chat-hf 6.345703125 Imported 2026-05-27
57 Yi-1.5-6B-Chat 6.263929618768328 Imported 2026-05-27
58 reka-edge 6.159335288367546 REKA Reka Edge
rekaai-reka-edge
Imported 2026-05-27
59 Llama-2-7b-chat-hf 5.761252446183953 Imported 2026-05-27
60 gemma-7b-it 5.5087890625 Imported 2026-05-27
61 gemma-2b-it 4.737512242899118 Imported 2026-05-27