Gaokao-Bench
Gaokao-Bench: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
13rows
objective_overall_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Objective Overall, Objective Chinese, Objective English, Objective Science Math, Objective Humanities Math, Objective Physics, Objective Chemistry, Objective Biology, Objective Politics, Objective History, Objective Geography, Subjective Overall, Subjective Chinese, Subjective English, Subjective Science Math, Subjective Humanities Math, Subjective Physics, Subjective Chemistry, Subjective Biology, Subjective Politics, Subjective History, Subjective Geography
| Rank | Subject | Objective Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4-0314 | 72.2% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 2 | GPT-4-0613 | 71.6% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | Gemini-Pro | 57.9% | — | Imported | 2026-05-27 |
| 4 | ERNIE-Bot-0615 | 56.6% | — | Imported | 2026-05-27 |
| 5 | GPT-3.5-turbo-0301 | 53.2% | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 6 | ERNIE-Bot-turbo-0725 | 45.6% | — | Imported | 2026-05-27 |
| 7 | Baichuan2-13b-Chat | 43.9% | — | Imported | 2026-05-27 |
| 8 | ChatGLM2-6b | 42.7% | — | Imported | 2026-05-27 |
| 9 | Baichuan2-7b-Chat | 40.5% | — | Imported | 2026-05-27 |
| 10 | ChatGLM-6b | 30.8% | — | Imported | 2026-05-27 |
| 11 | Baichuan2-7b-Base | 27.2% | — | Imported | 2026-05-27 |
| 12 | LLaMA-7b | 21.1% | — | Imported | 2026-05-27 |
| 13 | Vicuna-7b | 21% | — | Imported | 2026-05-27 |
No matching rows.