AlignBench
AlignBench is a comprehensive multi-dimensional benchmark for evaluating Chinese alignment of Large Language Models. It contains 8 main categories: Fundamental Language Ability, Advanced Chinese Understanding, Open-ended Questions, Writing Ability, Logical Reasoning, Mathematics, Task-oriented Role Play, and Professional Knowledge. The benchmark includes 683 real-scenario rooted queries with human-verified references and uses a rule-calibrated multi-dimensional LLM-as-Judge approach with Chain-of-Thought for evaluation.
4rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen2.5 72B Instruct | 0.82 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Self-reported | 2026-05-06 |
| 2 | DeepSeek-V2.5 | 0.80 | — | Self-reported | 2026-05-06 |
| 3 | Qwen2.5 7B Instruct | 0.73 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Self-reported | 2026-05-06 |
| 4 | Qwen2 7B Instruct | 0.72 | — | Self-reported | 2026-05-06 |
No matching rows.