HANS
HANS: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
4rows
overall_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Overall accuracy, lexical_overlap accuracy, subsequence accuracy, constituent accuracy
| Rank | Subject | Overall accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | esim | 49.416667% | — | Imported | 2026-05-27 |
| 2 | decomp attn | 49.19% | — | Imported | 2026-05-27 |
| 3 | bert | 48.733333% | — | Imported | 2026-05-27 |
| 4 | spinn | 47.42% | — | Imported | 2026-05-27 |
No matching rows.