CommonsenseQA
CommonsenseQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
13rows
random_split_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Random split accuracy, Random split sanity, Question concept split accuracy, Question concept split sanity
| Rank | Subject | Random split accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | BERT-Large | 55.9% | — | Imported | 2026-05-27 |
| 2 | GPT | 45.5% | — | Imported | 2026-05-27 |
| 3 | ESIM+ELMo | 34.1% | — | Imported | 2026-05-27 |
| 4 | ESIM+GloVe | 32.8% | — | Imported | 2026-05-27 |
| 5 | QABilinear+GloVe | 31.5% | — | Imported | 2026-05-27 |
| 6 | ESIM+Numberbatch | 30.1% | — | Imported | 2026-05-27 |
| 7 | VecSim+Numberbatch | 29.1% | — | Imported | 2026-05-27 |
| 8 | QABilinear+Numberbatch | 28.8% | — | Imported | 2026-05-27 |
| 9 | LM1B-Rep | 26.1% | — | Imported | 2026-05-27 |
| 10 | QACompare+GloVe | 25.7% | — | Imported | 2026-05-27 |
| 11 | LM1B-Concat | 25.3% | — | Imported | 2026-05-27 |
| 12 | VecSim+GloVe | 22.3% | — | Imported | 2026-05-27 |
| 13 | QACompare+Numberbatch | 20.4% | — | Imported | 2026-05-27 |
No matching rows.