AlignBench

AlignBench is a comprehensive multi-dimensional benchmark for evaluating Chinese alignment of Large Language Models. It contains 8 main categories: Fundamental Language Ability, Advanced Chinese Understanding, Open-ended Questions, Writing Ability, Logical Reasoning, Mathematics, Task-oriented Role Play, and Professional Knowledge. The benchmark includes 683 real-scenario rooted queries with human-verified references and uses a rule-calibrated multi-dimensional LLM-as-Judge approach with Chain-of-Thought for evaluation.

4rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: alignbench
Category: General Knowledge
Release: 2023-11-30
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Qwen2.5 72B Instruct	0.82	Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct	Self-reported	2026-05-06
2	DeepSeek-V2.5	0.80	—	Self-reported	2026-05-06
3	Qwen2.5 7B Instruct	0.73	Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct	Self-reported	2026-05-06
4	Qwen2 7B Instruct	0.72	—	Self-reported	2026-05-06

Metadata

Metrics

Latest Results