HotpotQA

HotpotQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

151rows
distractor_joint_f1primary metric
2026-05-27sampled

Metadata

Metrics

distractor Answer EM, distractor Answer F1, distractor Support EM, distractor Support F1, distractor Joint EM, distractor Joint F1, fullwiki Answer EM, fullwiki Answer F1, fullwiki Support EM, fullwiki Support F1, fullwiki Joint EM, fullwiki Joint F1

Latest Results

Rows are parsed from the public HotpotQA distractor and fullwiki leaderboard tables. Primary registry metric is distractor joint F1.

Rank Subject distractor Joint F1 Model Match Provenance Sampled
1 Beam Retrieval (single model) 77.54 Imported 2026-05-27
2 PipNet (single model) 76.95 Imported 2026-05-27
3 Smoothing R3 (single model) 76.69 Imported 2026-05-27
4 FE2H on ALBERT (single model) 76.54 Imported 2026-05-27
5 R3 (single model) 76.02 Imported 2026-05-27
6 SAE+ (single model) 75.72 Imported 2026-05-27
7 S2G+EGA (single model) 75.47 Imported 2026-05-27
8 S2G+ (single model) 75.45 Imported 2026-05-27
9 AMGN+ (single model) 75.24 Imported 2026-05-27
10 RD Model (single model) 75.17 Imported 2026-05-27
11 FE2H on ELECTRA (single model) 74.9 Imported 2026-05-27
12 SpiderNet-large (single model) 74.88 Imported 2026-05-27
13 GIT (single model) 74.84 Imported 2026-05-27
14 S2G+ (single model) 74.36 Imported 2026-05-27
15 Anonymous (single model) 74.27 Imported 2026-05-27
16 AnonymousS (single model) 74.27 Imported 2026-05-27
17 HGN-large (single model) 74.21 Imported 2026-05-27
18 AMGN (single model) 74.2 Imported 2026-05-27
19 BoSe (single model) 74.18 Imported 2026-05-27
20 BFR-Graph (single model) 74.13 Imported 2026-05-27
21 KIFGraph (single model) 74.12 Imported 2026-05-27
22 Anonymous (single model) 73.93 Imported 2026-05-27
23 GSAN-large (single model) 73.89 Imported 2026-05-27
24 GIT (single model) 73.87 Imported 2026-05-27
25 FFReader-large (single model) 73.78 Imported 2026-05-27
26 ETC-large (single model) 73.62 Imported 2026-05-27
27 Longformer (single model) 73.16 Imported 2026-05-27
28 RealFormer (single model) 73.13 Imported 2026-05-27
29 EGF Reader-large (single model) 72.96 Imported 2026-05-27
30 C2F Reader (single model) 72.73 Imported 2026-05-27
31 Text-CAN large (single model) 72.52 Imported 2026-05-27
32 SEGraph (single model) 72.4 Imported 2026-05-27
33 S2G-large (single model) 72.26 Imported 2026-05-27
34 AISO (single model) 72 Imported 2026-05-27
35 Chain-of-Skills (single model) 71.65 Imported 2026-05-27
36 () 71.46 Imported 2026-05-27
37 () (single model) 71.46 Imported 2026-05-27
38 SAE-large (single model) 71.45 Imported 2026-05-27
39 HGN (single model) 71.03 Imported 2026-05-27
40 SpiderNet-Base (single model) 70.9 Imported 2026-05-27
41 TPRR (single model) 70.83 Imported 2026-05-27
42 TAP 2 (ensemble) 70.65 Imported 2026-05-27
43 HopRetriever + Sp-search (single model) 70.61 Imported 2026-05-27
44 EPS + BERT(wwm) (single model) 70.48 Imported 2026-05-27
45 EBS-Large (single model) 70.04 Imported 2026-05-27
46 HopRetriever (single model) 69.84 Imported 2026-05-27
47 IRRR+ (single model) 69.6 Imported 2026-05-27
48 Anonymous (single model) 69.54 Imported 2026-05-27
49 S2G-base (single model) 69.51 Imported 2026-05-27
50 BDR+JNM (single model) 69.12 Imported 2026-05-27
51 TAP 2 (single model) 69.12 Imported 2026-05-27
52 EBS-SH (single model) 68.94 Imported 2026-05-27
53 AnonymousK (single model) 68.75 Imported 2026-05-27
54 GAR-BERT (single model) 68.74 Imported 2026-05-27
55 IRRR (single model) 68.59 Imported 2026-05-27
56 Anonymous (single model) 68.54 Imported 2026-05-27
57 Anonymous (single model) 68.37 Imported 2026-05-27
58 Anonymous (single model) 68.1 Imported 2026-05-27
59 Anonymous (ensemble) 68.08 Imported 2026-05-27
60 EPS + BERT(large) (single model) 67.92 Imported 2026-05-27
61 HopRetriever-V2 (single model) 67.75 Imported 2026-05-27
62 Anonymous (single model) 67.08 Imported 2026-05-27
63 AFSGraph-retriever (single model) 66.98 Imported 2026-05-27
64 Anonymous (single model) 66.87 Imported 2026-05-27
65 () 66.65 Imported 2026-05-27
66 GSAN-base (single model) 66.62 Imported 2026-05-27
67 Recursive Dense Retriever (single model) 66.55 Imported 2026-05-27
68 Step-by-Step Retriever (single model) 66.22 Imported 2026-05-27
69 Text-CAN (single model) 65.95 Imported 2026-05-27
70 SAE (single model) 64.96 Imported 2026-05-27
71 Anonymous (single model) 64.45 Imported 2026-05-27
72 () 64.01 Imported 2026-05-27
73 GAR (single model) 64.01 Imported 2026-05-27
74 HopRetriever-V1 (single model) 63.91 Imported 2026-05-27
75 DDRQA (single model) 63.88 Imported 2026-05-27
76 P-BERT (single model) 63.79 Imported 2026-05-27
77 Anonymous (single model) 63.75 Imported 2026-05-27
78 LQR-net 2 + BERT-Base (single model) 63.68 Imported 2026-05-27
79 EPS + BERT (single model) 63.41 Imported 2026-05-27
80 DR model large (single model) 62.95 Imported 2026-05-27
81 () 62.92 Imported 2026-05-27
82 HopAns (single model) 62.92 Imported 2026-05-27
83 PIPE (single model) 62.92 Imported 2026-05-27
84 Anonymous (single model) 62.86 Imported 2026-05-27
85 SEval (single model) 62.73 Imported 2026-05-27
86 Multi-dimensional-AFSGraph (single model) 62.44 Imported 2026-05-27
87 HGN-albert + SemanticRetrievalMRS IR (single model) 62.26 Imported 2026-05-27
88 TAP (single model) 61.9 Imported 2026-05-27
89 Tree-shaped-cluster (single model) 61.73 Imported 2026-05-27
90 SAQA (single model) 61.72 Imported 2026-05-27
91 MKGN (single model) 61.69 Imported 2026-05-27
92 AFSgraph (single model) 61.66 Imported 2026-05-27
93 Robustly Fine-tuned Graph-based Recurrent Retriever (single model) 61.18 Imported 2026-05-27
94 AFSgraph model (single model) 60.9 Imported 2026-05-27
95 HGN-large + SemanticRetrievalMRS IR (single model) 60.74 Imported 2026-05-27
96 GRN + BERT (single model) 60.31 Imported 2026-05-27
97 DPR-recurrent (single model) 60.23 Imported 2026-05-27
98 RoBERTa-DenseRetriever (single model) 60.05 Imported 2026-05-27
99 LQR-net + BERT-Base (single model) 59.99 Imported 2026-05-27
100 HGN + SemanticRetrievalMRS IR (single model) 59.86 Imported 2026-05-27
101 () 59.84 Imported 2026-05-27
102 DFGN (single model) 59.82 Imported 2026-05-27
103 QFE (single model) 59.61 Imported 2026-05-27
104 IRC (single model) 59.43 Imported 2026-05-27
105 LQR-net (ensemble) 58.86 Imported 2026-05-27
106 GRN (single model) 58.47 Imported 2026-05-27
107 BERT Plus (single model) 58.23 Imported 2026-05-27
108 DFGN + BERT (single model) 58.23 Imported 2026-05-27
109 GraphRR-Fast (single model) 56.85 Imported 2026-05-27
110 DR model (single model) 56.82 Imported 2026-05-27
111 Quark + SemanticRetrievalMRS IR (single model) 56.23 Imported 2026-05-27
112 GAR-BERT (single model) 56.1 Imported 2026-05-27
113 Graph-based Recurrent Retriever (single model) 55.31 Imported 2026-05-27
114 MIR+EPS+BERT (single model) 54.75 Imported 2026-05-27
115 GAR (single model) 52.95 Imported 2026-05-27
116 KGNN (single model) 52.82 Imported 2026-05-27
117 RoBERTa-L Two-step Model (single model) 52.5 Imported 2026-05-27
118 Transformer-XH-final(BERT-base) (single model) 51.29 Imported 2026-05-27
119 Transformer-XH (single model) 49.57 Imported 2026-05-27
120 SemanticRetrievalMRS (single model) 47.6 Imported 2026-05-27
121 () 44.88 Imported 2026-05-27
122 DrKIT (single model) 42.88 Imported 2026-05-27
123 () 41.77 Imported 2026-05-27
124 () 41.42 Imported 2026-05-27
125 GAR-NOSF (single model) 41.42 Imported 2026-05-27
126 () 40.89 Imported 2026-05-27
127 Baseline Model (single model) 40.16 Imported 2026-05-27
128 () 39.25 Imported 2026-05-27
129 Entity-centric BERT Pipeline (single model) 39.18 Imported 2026-05-27
130 GoldEn Retriever (single model) 39.13 Imported 2026-05-27
131 PR-Bert (single model) 39.11 Imported 2026-05-27
132 SAFSr-Bert (single model) 37 Imported 2026-05-27
133 Cognitive Graph QA (single model) 34.92 Imported 2026-05-27
134 GAR-NOSF (single model) 33.36 Imported 2026-05-27
135 IKFGraph (single model) 30.38 Imported 2026-05-27
136 () 29.07 Imported 2026-05-27
137 AnonymousQ (single model) 29.07 Imported 2026-05-27
138 HGN Model-reproduce (single model) 28.4 Imported 2026-05-27
139 MUPPET (single model) 27.01 Imported 2026-05-27
140 GRN + BERT (single model) 25.84 Imported 2026-05-27
141 Entity-centric IR (single model) 25.47 Imported 2026-05-27
142 KGNN (single model) 24.66 Imported 2026-05-27
143 SAQA (single model) 24.49 Imported 2026-05-27
144 GRN (single model) 23.55 Imported 2026-05-27
145 QFE (single model) 23.1 Imported 2026-05-27
146 SAFSr_model (single model) 20.9 Imported 2026-05-27
147 Baseline Model (single model) 16.15 Imported 2026-05-27
148 () 1.11 Imported 2026-05-27
149 () 0 Imported 2026-05-27
150 graph-recurrent-retriever+roberta-base w. S/R-pretraining (single model) 0 Imported 2026-05-27
151 Mistral multi hop with very large sources (single model) 0 Imported 2026-05-27