StrategyQA
StrategyQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
9rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Recall@10
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | RoBERTa*^last-step_ORA-P-D | 72.0 | — | Imported | 2026-05-27 |
| 2 | RoBERTa*_ORA-P | 70.7 | — | Imported | 2026-05-27 |
| 3 | RoBERTa*^last-step-raw_ORA-P-D | 65.2 | — | Imported | 2026-05-27 |
| 4 | RoBERTa*_ | 63.6 | — | Imported | 2026-05-27 |
| 5 | RoBERTa*_IR-Q | 63.6 | — | Imported | 2026-05-27 |
| 6 | RoBERTa*_IR-ORA-D | 62.0 | — | Imported | 2026-05-27 |
| 7 | RoBERTa*_IR-D | 61.7 | — | Imported | 2026-05-27 |
| 8 | Majority | 53.9 | — | Imported | 2026-05-27 |
| 9 | RoBERTa_IR-Q | 53.6 | — | Imported | 2026-05-27 |
No matching rows.