QASC
QASC: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
12rows
test_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Test accuracy, Dev accuracy
| Rank | Subject | Test accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Human Score | 93.0 | — | Imported | 2026-05-27 |
| 2 | BERT-MCQ whole-word masking FSL two-step + RACE + SCI | 73.2 | — | Imported | 2026-05-27 |
| 3 | BERT-MCQ bert-large-cased FSL two-step + RACE + SCI | 68.5 | — | Imported | 2026-05-27 |
| 4 | BERT-MCQ bert-large-cased FSL two-step | 67.0 | — | Imported | 2026-05-27 |
| 5 | AristoBertV7 whole-word masking | 62.6 | — | Imported | 2026-05-27 |
| 6 | BERT-MCQ bert-large-cased FSL + ARC two-step | 58.3 | — | Imported | 2026-05-27 |
| 7 | BERT-MCQ bert-large-cased FSL + ARC one-step | 57.0 | — | Imported | 2026-05-27 |
| 8 | BERT-MCQ bert-large-cased FSL one-step | 53.2 | — | Imported | 2026-05-27 |
| 9 | Odd-one-out (GloVe) | 18.0 | — | Imported | 2026-05-27 |
| 10 | ESIM Q2Choice (GloVe) | 17.2 | — | Imported | 2026-05-27 |
| 11 | ESIM Q2Choice (GloVe + ELMo) | 15.2 | — | Imported | 2026-05-27 |
| 12 | Random | 12.5 | — | Imported | 2026-05-27 |
No matching rows.