AlignBench

AlignBench is a comprehensive multi-dimensional benchmark for evaluating Chinese alignment of Large Language Models. It contains 8 main categories: Fundamental Language Ability, Advanced Chinese Understanding, Open-ended Questions, Writing Ability, Logical Reasoning, Mathematics, Task-oriented Role Play, and Professional Knowledge. The benchmark includes 683 real-scenario rooted queries with human-verified references and uses a rule-calibrated multi-dimensional LLM-as-Judge approach with Chain-of-Thought for evaluation.

4rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Qwen2.5 72B Instruct 0.82 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Self-reported 2026-05-06
2 DeepSeek-V2.5 0.80 Self-reported 2026-05-06
3 Qwen2.5 7B Instruct 0.73 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Self-reported 2026-05-06
4 Qwen2 7B Instruct 0.72 Self-reported 2026-05-06