COPA

COPA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

10rows
test_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Test accuracy

Latest Results

Rows are imported from the official COPA page's Competitive results section. Primary score is COPA test-set accuracy.

Rank Subject Test accuracy Model Match Provenance Sampled
1 BERT tuned with Social IQa 84.4% Imported 2026-05-27
2 GPT 78.6% Imported 2026-05-27
3 Learning to Rank for Plausible Plausibility 75.4% Imported 2026-05-27
4 Multiword expressions causality estimation 71.2% Imported 2026-05-27
5 Commonsense causal reasoning between short texts 70.2% Imported 2026-05-27
6 Encoder-decoder causal relations in stories 66.2% Imported 2026-05-27
7 Personal stories commonsense causal reasoning system 65.4% Imported 2026-05-27
8 UTDHLT COPACETIC 63.4% Imported 2026-05-27
9 Asymmetric associations causality detection 58.8% Imported 2026-05-27
10 PMIgutenbergW5 58.8% Imported 2026-05-27