CSimpleQA
Chinese SimpleQA is the first comprehensive Chinese benchmark to evaluate the factuality ability of language models to answer short questions. It contains 3,000 high-quality questions spanning 6 major topics with 99 diverse subtopics, designed to assess Chinese factual knowledge across humanities, science, engineering, culture, and society.
7rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DeepSeek-V4-Pro-Max | 0.84 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-06 |
| 2 | Qwen3-235B-A22B-Instruct-2507 | 0.84 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Self-reported | 2026-05-06 |
| 3 | Qwen3 VL 235B A22B Instruct | 0.83 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Self-reported | 2026-05-06 |
| 4 | DeepSeek-V4-Flash-Max | 0.79 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Self-reported | 2026-05-06 |
| 5 | Kimi K2 Instruct | 0.78 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Self-reported | 2026-05-06 |
| 6 | Kimi K2 Base | 0.78 | — | Self-reported | 2026-05-06 |
| 7 | DeepSeek-V3 | 0.65 | DeepSeek V3 deepseek-deepseek-chat | Self-reported | 2026-05-06 |
No matching rows.