FEVER
FEVER: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
24rows
fever_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
FEVER Score, Accuracy, Evidence F1
| Rank | Subject | FEVER Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | UNC-NLP | 0.6398 | — | Imported | 2026-05-27 |
| 2 | UCL Machine Reading Group | 0.6234 | — | Imported | 2026-05-27 |
| 3 | Athene UKP TU Darmstadt | 0.6132 | — | Imported | 2026-05-27 |
| 4 | Papelo | 0.5704 | — | Imported | 2026-05-27 |
| 5 | SWEEPer | 0.4986 | — | Imported | 2026-05-27 |
| 6 | ColumbiaNLP | 0.4888 | — | Imported | 2026-05-27 |
| 7 | The Ohio State University | 0.4322 | — | Imported | 2026-05-27 |
| 8 | GESIS Cologne | 0.4058 | — | Imported | 2026-05-27 |
| 9 | nayeon7lee | 0.3858 | — | Imported | 2026-05-27 |
| 10 | FujiXerox | 0.385 | — | Imported | 2026-05-27 |
| 11 | JanK | 0.3831 | — | Imported | 2026-05-27 |
| 12 | Directed Acyclic Graph | 0.3824 | — | Imported | 2026-05-27 |
| 13 | jg | 0.3721 | — | Imported | 2026-05-27 |
| 14 | SIRIUS-LTG-UIO | 0.3664 | — | Imported | 2026-05-27 |
| 15 | Py.ro | 0.363 | — | Imported | 2026-05-27 |
| 16 | hanshan | 0.2982 | — | Imported | 2026-05-27 |
| 17 | lisizhen | 0.2898 | — | Imported | 2026-05-27 |
| 18 | HZ | 0.2867 | — | Imported | 2026-05-27 |
| 19 | UCSB | 0.2835 | — | Imported | 2026-05-27 |
| 20 | FEVER Baseline | 0.2771 | — | Imported | 2026-05-27 |
| 21 | ankur-umbc | 0.2369 | — | Imported | 2026-05-27 |
| 22 | m6.ub.6m.bu | 0.2275 | — | Imported | 2026-05-27 |
| 23 | ubub.bubu.61 | 0.2154 | — | Imported | 2026-05-27 |
| 24 | mithunpaul08 | 0.1928 | — | Imported | 2026-05-27 |
No matching rows.