StrategyQA

StrategyQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

9rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Recall@10

Latest Results

Rows are parsed from the StrategyQA paper arXiv LaTeX baseline-results table.

Rank Subject Accuracy Model Match Provenance Sampled
1 RoBERTa*^last-step_ORA-P-D 72.0 Imported 2026-05-27
2 RoBERTa*_ORA-P 70.7 Imported 2026-05-27
3 RoBERTa*^last-step-raw_ORA-P-D 65.2 Imported 2026-05-27
4 RoBERTa*_ 63.6 Imported 2026-05-27
5 RoBERTa*_IR-Q 63.6 Imported 2026-05-27
6 RoBERTa*_IR-ORA-D 62.0 Imported 2026-05-27
7 RoBERTa*_IR-D 61.7 Imported 2026-05-27
8 Majority 53.9 Imported 2026-05-27
9 RoBERTa_IR-Q 53.6 Imported 2026-05-27