COPA
COPA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
10rows
test_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Test accuracy
| Rank | Subject | Test accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | BERT tuned with Social IQa | 84.4% | — | Imported | 2026-05-27 |
| 2 | GPT | 78.6% | — | Imported | 2026-05-27 |
| 3 | Learning to Rank for Plausible Plausibility | 75.4% | — | Imported | 2026-05-27 |
| 4 | Multiword expressions causality estimation | 71.2% | — | Imported | 2026-05-27 |
| 5 | Commonsense causal reasoning between short texts | 70.2% | — | Imported | 2026-05-27 |
| 6 | Encoder-decoder causal relations in stories | 66.2% | — | Imported | 2026-05-27 |
| 7 | Personal stories commonsense causal reasoning system | 65.4% | — | Imported | 2026-05-27 |
| 8 | UTDHLT COPACETIC | 63.4% | — | Imported | 2026-05-27 |
| 9 | Asymmetric associations causality detection | 58.8% | — | Imported | 2026-05-27 |
| 10 | PMIgutenbergW5 | 58.8% | — | Imported | 2026-05-27 |
No matching rows.