HANS | BenchmarkList

Metadata

Overall accuracy, lexical_overlap accuracy, subsequence accuracy, constituent accuracy

Rank	Subject	Overall accuracy	Model Match	Provenance	Sampled
1	esim	49.416667%	—	Imported	2026-05-27
2	decomp attn	49.19%	—	Imported	2026-05-27
3	bert	48.733333%	—	Imported	2026-05-27
4	spinn	47.42%	—	Imported	2026-05-27