AlpacaEval

Automatic instruction-following evaluator comparing model responses against a reference using GPT-4 judgments and length-controlled win rates.

102rows
length_controlled_winrateprimary metric
2026-05-27sampled

Metadata

Metrics

Length-Controlled Win Rate, Win Rate, Standard Error (lower is better), Discrete Win Rate, Average Length (lower is better)

Latest Results

Rows are parsed from the public AlpacaEval GPT-4 leaderboard CSV. Primary score is length-controlled win rate when present, otherwise win rate.

Rank Subject Length-Controlled Win Rate Model Match Provenance Sampled
1 xwinlm-70b-v0.1 95.56803995 Imported 2026-05-27
2 Mistral-7B-ReMax-v0.1 94.39601494396015 Imported 2026-05-27
3 xwinlm-70b-v0.3 94.01522563893708 Imported 2026-05-27
4 xwinlm-13b-v0.1 91.76029963 Imported 2026-05-27
5 mistral-medium 91.54314285144824 Imported 2026-05-27
6 ultralm-13b-best-of-16 91.54228856 Imported 2026-05-27
7 gpt4_1106_preview 89.85849210429464 GPT-4
openai-gpt-4
Imported 2026-05-27
8 openchat-v3.1-13b 89.49004975 Imported 2026-05-27
9 wizardlm-13b-v1.2 89.16562889 Imported 2026-05-27
10 vicuna-33b-v1.3 88.99253731 Imported 2026-05-27
11 humpback-llama2-70b 87.93532338 Imported 2026-05-27
12 xwinlm-7b-v0.1 87.82771536 Imported 2026-05-27
13 openbuddy-llama2-70b-v10.1 87.67123288 Imported 2026-05-27
14 openchat-v2-w-13b 87.12686567 Imported 2026-05-27
15 openbuddy-llama-65b-v8 86.53366584 Imported 2026-05-27
16 gpt4 86.51018625518144 GPT-4
openai-gpt-4
Imported 2026-05-27
17 wizardlm-13b-v1.1 86.31840796 Imported 2026-05-27
18 pairrm-tulu-2-70b 85.58824844769076 Imported 2026-05-27
19 gpt4_0314 85.334647371383 GPT-4
openai-gpt-4
Imported 2026-05-27
20 openchat-v2-13b 84.9689441 Imported 2026-05-27
21 LMCocktail-10.7B-v1 84.7840193355363 Imported 2026-05-27
22 pairrm-zephyr-7b-beta 84.7091351498575 Imported 2026-05-27
23 tulu-2-dpo-70b 84.25730016896037 Imported 2026-05-27
24 humpback-llama-65b 83.70646766 Imported 2026-05-27
25 Mistral-7B+RAHF-DUAL+LoRA 83.35673751418108 Imported 2026-05-27
26 Mistral-7B-Instruct-v0.2 82.98089782565651 Imported 2026-05-27
27 Mixtral-8x7B-Instruct-v0.1 82.59666180688257 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-27
28 vicuna-13b-v1.3 82.11180124 Imported 2026-05-27
29 gpt-3.5-turbo-16k-0613 81.73910844041163 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
30 openbuddy-llama-30b-v7.1 81.54613466 Imported 2026-05-27
31 gpt4_0613 81.38159399734118 GPT-4
openai-gpt-4
Imported 2026-05-27
32 tulu-2-dpo-13b 81.235850076993 Imported 2026-05-27
33 openchat-13b 80.86956522 Imported 2026-05-27
34 openbuddy-falcon-40b-v9 80.69738481 Imported 2026-05-27
35 ultralm-13b 80.63511831 Imported 2026-05-27
36 openchat8192-13b 79.539801 Imported 2026-05-27
37 gpt-3.5-turbo-0301 79.17893267677465 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
38 opencoderplus-15b 78.69565217 Imported 2026-05-27
39 tulu-2-dpo-7b 77.85355333126851 Imported 2026-05-27
40 openbuddy-llama2-13b-v11.1 77.48756219 Imported 2026-05-27
41 vicuna-7b-v1.3 76.84144819 Imported 2026-05-27
42 claude 76.83227965166517 Imported 2026-05-27
43 Yi-34B-Chat 76.35646640775717 Imported 2026-05-27
44 ultralm-13b-v2.0-best-of-16 76.29672881234201 Imported 2026-05-27
45 zephyr-7b-beta 76.29202319983864 Imported 2026-05-27
46 gpt-3.5-turbo-1106 75.55853548412969 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
47 claude-2 74.33550560445303 Imported 2026-05-27
48 jina-chat 74.12718204 Imported 2026-05-27
49 llama-2-70b-chat-hf 74.11120112901445 Imported 2026-05-27
50 airoboros-65b 73.91304348 Imported 2026-05-27
51 zephyr-7b-alpha 73.46973908236046 Imported 2026-05-27
52 airoboros-33b 73.29192547 Imported 2026-05-27
53 evo-v2-7b 72.09602817675409 Imported 2026-05-27
54 cut-13b 71.40952810665395 Imported 2026-05-27
55 deita-7b-v1.0 71.13305243806445 Imported 2026-05-27
56 ghost-7b-alpha 70.44025157232704 Imported 2026-05-27
57 openbuddy-falcon-7b-v6 70.3611457 Imported 2026-05-27
58 causallm-14b 69.99239868161098 Imported 2026-05-27
59 pairrm-tulu-2-13b 68.33213332478894 Imported 2026-05-27
60 baize-v2-13b 66.95652174 Imported 2026-05-27
61 gpt35_turbo_instruct 66.88517803643602 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
62 minotaur-13b 66.02484472 Imported 2026-05-27
63 guanaco-33b 65.96273292 Imported 2026-05-27
64 claude-2.1 65.9557674840558 Imported 2026-05-27
65 nous-hermes-13b 65.46583851 Imported 2026-05-27
66 vicuna-7b 64.40993789 Imported 2026-05-27
67 baize-v2-7b 63.85093168 Imported 2026-05-27
68 ultralm-13b-v2.0 63.77774668548318 Imported 2026-05-27
69 wizardlm-13b 62.55024525088112 Imported 2026-05-27
70 cohere 61.87530037843918 Imported 2026-05-27
71 gemini-pro 57.96703555960053 Imported 2026-05-27
72 oasst-rlhf-llama-33b 55.80913636693129 Imported 2026-05-27
73 oasst-sft-llama-33b 54.9689441 Imported 2026-05-27
74 guanaco-65b 54.69096685665386 Imported 2026-05-27
75 phi-2-dpo 54.28867357876411 Imported 2026-05-27
76 platolm-7b 53.09897561500652 Imported 2026-05-27
77 guanaco-13b 52.60869565 Imported 2026-05-27
78 minichat-1.5-3b 51.47924234116803 Imported 2026-05-27
79 recycled-wizardlm-7b-v2.0 51.09808140925867 Imported 2026-05-27
80 vicuna-13b 50.00294675412896 Imported 2026-05-27
81 text_davinci_003 50 Imported 2026-05-27
82 evo-7b 49.96597750089794 Imported 2026-05-27
83 llama-2-13b-chat-hf 49.81099211276289 Imported 2026-05-27
84 claude2-alpaca-13b 49.72428405745508 Imported 2026-05-27
85 chatglm2-6b 47.12858926 Imported 2026-05-27
86 guanaco-7b 46.58385093 Imported 2026-05-27
87 recycled-wizardlm-7b-v1.0 46.27776656706335 Imported 2026-05-27
88 llama-2-chat-7b-evol70k-neft 45.84186320829894 Imported 2026-05-27
89 phi-2-sft 44.73886185749778 Imported 2026-05-27
90 alpaca-farm-ppo-sim-gpt4-20k 44.09937888 GPT-4
openai-gpt-4
Imported 2026-05-27
91 pythia-12b-mix-sft 41.86335404 Imported 2026-05-27
92 falcon-40b-instruct 39.14246411706998 Imported 2026-05-27
93 falcon-7b-instruct 39.14246411706998 Imported 2026-05-27
94 minichat-3b 31.963518903280573 Imported 2026-05-27
95 alpaca-7b-neft 31.61170102536985 Imported 2026-05-27
96 phi-2 29.81920417817079 Imported 2026-05-27
97 alpaca-farm-ppo-human 29.78213586412439 Imported 2026-05-27
98 llama-2-7b-chat-hf 29.29429740470164 Imported 2026-05-27
99 alpaca-7b 26.29495433067113 Imported 2026-05-27
100 oasst-sft-pythia-12b 25.96273292 Imported 2026-05-27
101 baichuan-13b-chat 21.80124224 Imported 2026-05-27
102 text_davinci_001 20.57118821914347 Imported 2026-05-27