QASC

QASC: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

12rows
test_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Test accuracy, Dev accuracy

Latest Results

QASC paper Table tab:experiments baseline test/dev accuracy rows.

Rank Subject Test accuracy Model Match Provenance Sampled
1 Human Score 93.0 Imported 2026-05-27
2 BERT-MCQ whole-word masking FSL two-step + RACE + SCI 73.2 Imported 2026-05-27
3 BERT-MCQ bert-large-cased FSL two-step + RACE + SCI 68.5 Imported 2026-05-27
4 BERT-MCQ bert-large-cased FSL two-step 67.0 Imported 2026-05-27
5 AristoBertV7 whole-word masking 62.6 Imported 2026-05-27
6 BERT-MCQ bert-large-cased FSL + ARC two-step 58.3 Imported 2026-05-27
7 BERT-MCQ bert-large-cased FSL + ARC one-step 57.0 Imported 2026-05-27
8 BERT-MCQ bert-large-cased FSL one-step 53.2 Imported 2026-05-27
9 Odd-one-out (GloVe) 18.0 Imported 2026-05-27
10 ESIM Q2Choice (GloVe) 17.2 Imported 2026-05-27
11 ESIM Q2Choice (GloVe + ELMo) 15.2 Imported 2026-05-27
12 Random 12.5 Imported 2026-05-27