MT-Bench
MT-Bench: Evaluates conversational quality, human preference, helpfulness, and pairwise response judgments.
34rows
mt_bench_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
MT-Bench average score, Turn 1 average score, Turn 2 average score, GPT-4 single judgments
| Rank | Subject | MT-Bench average score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4 | 8.990625 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 2 | gpt-3.5-turbo | 7.94375 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 3 | claude-v1 | 7.9 | — | Imported | 2026-05-27 |
| 4 | claude-instant-v1 | 7.85 | — | Imported | 2026-05-27 |
| 5 | vicuna-33b-v1.3 | 7.121875 | — | Imported | 2026-05-27 |
| 6 | wizardlm-30b | 7.009375 | — | Imported | 2026-05-27 |
| 7 | Llama-2-70b-chat | 6.85625 | — | Imported | 2026-05-27 |
| 8 | Llama-2-13b-chat | 6.65 | — | Imported | 2026-05-27 |
| 9 | guanaco-33b | 6.528125 | — | Imported | 2026-05-27 |
| 10 | tulu-30b | 6.434375 | — | Imported | 2026-05-27 |
| 11 | guanaco-65b | 6.409375 | — | Imported | 2026-05-27 |
| 12 | oasst-sft-7-llama-30b | 6.409375 | — | Imported | 2026-05-27 |
| 13 | palm-2-chat-bison-001 | 6.4 | — | Imported | 2026-05-27 |
| 14 | mpt-30b-chat | 6.39375 | — | Imported | 2026-05-27 |
| 15 | vicuna-13b-v1.3 | 6.3875 | — | Imported | 2026-05-27 |
| 16 | wizardlm-13b | 6.353125 | — | Imported | 2026-05-27 |
| 17 | Llama-2-7b-chat | 6.26875 | — | Imported | 2026-05-27 |
| 18 | vicuna-7b-v1.3 | 5.996875 | — | Imported | 2026-05-27 |
| 19 | baize-v2-13b | 5.75 | — | Imported | 2026-05-27 |
| 20 | nous-hermes-13b | 5.5125 | — | Imported | 2026-05-27 |
| 21 | mpt-7b-chat | 5.41875 | — | Imported | 2026-05-27 |
| 22 | gpt4all-13b-snoozy | 5.4125 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 23 | koala-13b | 5.35 | — | Imported | 2026-05-27 |
| 24 | mpt-30b-instruct | 5.21875 | — | Imported | 2026-05-27 |
| 25 | falcon-40b-instruct | 5.16875 | — | Imported | 2026-05-27 |
| 26 | h2ogpt-oasst-open-llama-13b | 4.625 | — | Imported | 2026-05-27 |
| 27 | alpaca-13b | 4.53125 | — | Imported | 2026-05-27 |
| 28 | chatglm-6b | 4.5 | — | Imported | 2026-05-27 |
| 29 | oasst-sft-4-pythia-12b | 4.31875 | — | Imported | 2026-05-27 |
| 30 | rwkv-4-raven-14b | 3.984375 | — | Imported | 2026-05-27 |
| 31 | dolly-v2-12b | 3.275 | — | Imported | 2026-05-27 |
| 32 | fastchat-t5-3b | 3.040625 | — | Imported | 2026-05-27 |
| 33 | stablelm-tuned-alpha-7b | 2.753125 | — | Imported | 2026-05-27 |
| 34 | llama-13b | 2.60625 | — | Imported | 2026-05-27 |
No matching rows.