FEVER

FEVER: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

24rows
fever_scoreprimary metric
2026-05-27sampled

Metadata

Metrics

FEVER Score, Accuracy, Evidence F1

Latest Results

Rows are parsed from the public FEVER 2018 final leaderboard table.

Rank Subject FEVER Score Model Match Provenance Sampled
1 UNC-NLP 0.6398 Imported 2026-05-27
2 UCL Machine Reading Group 0.6234 Imported 2026-05-27
3 Athene UKP TU Darmstadt 0.6132 Imported 2026-05-27
4 Papelo 0.5704 Imported 2026-05-27
5 SWEEPer 0.4986 Imported 2026-05-27
6 ColumbiaNLP 0.4888 Imported 2026-05-27
7 The Ohio State University 0.4322 Imported 2026-05-27
8 GESIS Cologne 0.4058 Imported 2026-05-27
9 nayeon7lee 0.3858 Imported 2026-05-27
10 FujiXerox 0.385 Imported 2026-05-27
11 JanK 0.3831 Imported 2026-05-27
12 Directed Acyclic Graph 0.3824 Imported 2026-05-27
13 jg 0.3721 Imported 2026-05-27
14 SIRIUS-LTG-UIO 0.3664 Imported 2026-05-27
15 Py.ro 0.363 Imported 2026-05-27
16 hanshan 0.2982 Imported 2026-05-27
17 lisizhen 0.2898 Imported 2026-05-27
18 HZ 0.2867 Imported 2026-05-27
19 UCSB 0.2835 Imported 2026-05-27
20 FEVER Baseline 0.2771 Imported 2026-05-27
21 ankur-umbc 0.2369 Imported 2026-05-27
22 m6.ub.6m.bu 0.2275 Imported 2026-05-27
23 ubub.bubu.61 0.2154 Imported 2026-05-27
24 mithunpaul08 0.1928 Imported 2026-05-27