COPA | BenchmarkList

Metadata

Test accuracy

Rank	Subject	Test accuracy	Model Match	Provenance	Sampled
1	BERT tuned with Social IQa	84.4%	—	Imported	2026-05-27
2	GPT	78.6%	—	Imported	2026-05-27
3	Learning to Rank for Plausible Plausibility	75.4%	—	Imported	2026-05-27
4	Multiword expressions causality estimation	71.2%	—	Imported	2026-05-27
5	Commonsense causal reasoning between short texts	70.2%	—	Imported	2026-05-27
6	Encoder-decoder causal relations in stories	66.2%	—	Imported	2026-05-27
7	Personal stories commonsense causal reasoning system	65.4%	—	Imported	2026-05-27
8	UTDHLT COPACETIC	63.4%	—	Imported	2026-05-27
9	Asymmetric associations causality detection	58.8%	—	Imported	2026-05-27
10	PMIgutenbergW5	58.8%	—	Imported	2026-05-27