HotpotQA
HotpotQA: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
151rows
distractor_joint_f1primary metric
2026-05-27sampled
Metadata
Metrics
distractor Answer EM, distractor Answer F1, distractor Support EM, distractor Support F1, distractor Joint EM, distractor Joint F1, fullwiki Answer EM, fullwiki Answer F1, fullwiki Support EM, fullwiki Support F1, fullwiki Joint EM, fullwiki Joint F1
| Rank | Subject | distractor Joint F1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Beam Retrieval (single model) | 77.54 | — | Imported | 2026-05-27 |
| 2 | PipNet (single model) | 76.95 | — | Imported | 2026-05-27 |
| 3 | Smoothing R3 (single model) | 76.69 | — | Imported | 2026-05-27 |
| 4 | FE2H on ALBERT (single model) | 76.54 | — | Imported | 2026-05-27 |
| 5 | R3 (single model) | 76.02 | — | Imported | 2026-05-27 |
| 6 | SAE+ (single model) | 75.72 | — | Imported | 2026-05-27 |
| 7 | S2G+EGA (single model) | 75.47 | — | Imported | 2026-05-27 |
| 8 | S2G+ (single model) | 75.45 | — | Imported | 2026-05-27 |
| 9 | AMGN+ (single model) | 75.24 | — | Imported | 2026-05-27 |
| 10 | RD Model (single model) | 75.17 | — | Imported | 2026-05-27 |
| 11 | FE2H on ELECTRA (single model) | 74.9 | — | Imported | 2026-05-27 |
| 12 | SpiderNet-large (single model) | 74.88 | — | Imported | 2026-05-27 |
| 13 | GIT (single model) | 74.84 | — | Imported | 2026-05-27 |
| 14 | S2G+ (single model) | 74.36 | — | Imported | 2026-05-27 |
| 15 | Anonymous (single model) | 74.27 | — | Imported | 2026-05-27 |
| 16 | AnonymousS (single model) | 74.27 | — | Imported | 2026-05-27 |
| 17 | HGN-large (single model) | 74.21 | — | Imported | 2026-05-27 |
| 18 | AMGN (single model) | 74.2 | — | Imported | 2026-05-27 |
| 19 | BoSe (single model) | 74.18 | — | Imported | 2026-05-27 |
| 20 | BFR-Graph (single model) | 74.13 | — | Imported | 2026-05-27 |
| 21 | KIFGraph (single model) | 74.12 | — | Imported | 2026-05-27 |
| 22 | Anonymous (single model) | 73.93 | — | Imported | 2026-05-27 |
| 23 | GSAN-large (single model) | 73.89 | — | Imported | 2026-05-27 |
| 24 | GIT (single model) | 73.87 | — | Imported | 2026-05-27 |
| 25 | FFReader-large (single model) | 73.78 | — | Imported | 2026-05-27 |
| 26 | ETC-large (single model) | 73.62 | — | Imported | 2026-05-27 |
| 27 | Longformer (single model) | 73.16 | — | Imported | 2026-05-27 |
| 28 | RealFormer (single model) | 73.13 | — | Imported | 2026-05-27 |
| 29 | EGF Reader-large (single model) | 72.96 | — | Imported | 2026-05-27 |
| 30 | C2F Reader (single model) | 72.73 | — | Imported | 2026-05-27 |
| 31 | Text-CAN large (single model) | 72.52 | — | Imported | 2026-05-27 |
| 32 | SEGraph (single model) | 72.4 | — | Imported | 2026-05-27 |
| 33 | S2G-large (single model) | 72.26 | — | Imported | 2026-05-27 |
| 34 | AISO (single model) | 72 | — | Imported | 2026-05-27 |
| 35 | Chain-of-Skills (single model) | 71.65 | — | Imported | 2026-05-27 |
| 36 | () | 71.46 | — | Imported | 2026-05-27 |
| 37 | () (single model) | 71.46 | — | Imported | 2026-05-27 |
| 38 | SAE-large (single model) | 71.45 | — | Imported | 2026-05-27 |
| 39 | HGN (single model) | 71.03 | — | Imported | 2026-05-27 |
| 40 | SpiderNet-Base (single model) | 70.9 | — | Imported | 2026-05-27 |
| 41 | TPRR (single model) | 70.83 | — | Imported | 2026-05-27 |
| 42 | TAP 2 (ensemble) | 70.65 | — | Imported | 2026-05-27 |
| 43 | HopRetriever + Sp-search (single model) | 70.61 | — | Imported | 2026-05-27 |
| 44 | EPS + BERT(wwm) (single model) | 70.48 | — | Imported | 2026-05-27 |
| 45 | EBS-Large (single model) | 70.04 | — | Imported | 2026-05-27 |
| 46 | HopRetriever (single model) | 69.84 | — | Imported | 2026-05-27 |
| 47 | IRRR+ (single model) | 69.6 | — | Imported | 2026-05-27 |
| 48 | Anonymous (single model) | 69.54 | — | Imported | 2026-05-27 |
| 49 | S2G-base (single model) | 69.51 | — | Imported | 2026-05-27 |
| 50 | BDR+JNM (single model) | 69.12 | — | Imported | 2026-05-27 |
| 51 | TAP 2 (single model) | 69.12 | — | Imported | 2026-05-27 |
| 52 | EBS-SH (single model) | 68.94 | — | Imported | 2026-05-27 |
| 53 | AnonymousK (single model) | 68.75 | — | Imported | 2026-05-27 |
| 54 | GAR-BERT (single model) | 68.74 | — | Imported | 2026-05-27 |
| 55 | IRRR (single model) | 68.59 | — | Imported | 2026-05-27 |
| 56 | Anonymous (single model) | 68.54 | — | Imported | 2026-05-27 |
| 57 | Anonymous (single model) | 68.37 | — | Imported | 2026-05-27 |
| 58 | Anonymous (single model) | 68.1 | — | Imported | 2026-05-27 |
| 59 | Anonymous (ensemble) | 68.08 | — | Imported | 2026-05-27 |
| 60 | EPS + BERT(large) (single model) | 67.92 | — | Imported | 2026-05-27 |
| 61 | HopRetriever-V2 (single model) | 67.75 | — | Imported | 2026-05-27 |
| 62 | Anonymous (single model) | 67.08 | — | Imported | 2026-05-27 |
| 63 | AFSGraph-retriever (single model) | 66.98 | — | Imported | 2026-05-27 |
| 64 | Anonymous (single model) | 66.87 | — | Imported | 2026-05-27 |
| 65 | () | 66.65 | — | Imported | 2026-05-27 |
| 66 | GSAN-base (single model) | 66.62 | — | Imported | 2026-05-27 |
| 67 | Recursive Dense Retriever (single model) | 66.55 | — | Imported | 2026-05-27 |
| 68 | Step-by-Step Retriever (single model) | 66.22 | — | Imported | 2026-05-27 |
| 69 | Text-CAN (single model) | 65.95 | — | Imported | 2026-05-27 |
| 70 | SAE (single model) | 64.96 | — | Imported | 2026-05-27 |
| 71 | Anonymous (single model) | 64.45 | — | Imported | 2026-05-27 |
| 72 | () | 64.01 | — | Imported | 2026-05-27 |
| 73 | GAR (single model) | 64.01 | — | Imported | 2026-05-27 |
| 74 | HopRetriever-V1 (single model) | 63.91 | — | Imported | 2026-05-27 |
| 75 | DDRQA (single model) | 63.88 | — | Imported | 2026-05-27 |
| 76 | P-BERT (single model) | 63.79 | — | Imported | 2026-05-27 |
| 77 | Anonymous (single model) | 63.75 | — | Imported | 2026-05-27 |
| 78 | LQR-net 2 + BERT-Base (single model) | 63.68 | — | Imported | 2026-05-27 |
| 79 | EPS + BERT (single model) | 63.41 | — | Imported | 2026-05-27 |
| 80 | DR model large (single model) | 62.95 | — | Imported | 2026-05-27 |
| 81 | () | 62.92 | — | Imported | 2026-05-27 |
| 82 | HopAns (single model) | 62.92 | — | Imported | 2026-05-27 |
| 83 | PIPE (single model) | 62.92 | — | Imported | 2026-05-27 |
| 84 | Anonymous (single model) | 62.86 | — | Imported | 2026-05-27 |
| 85 | SEval (single model) | 62.73 | — | Imported | 2026-05-27 |
| 86 | Multi-dimensional-AFSGraph (single model) | 62.44 | — | Imported | 2026-05-27 |
| 87 | HGN-albert + SemanticRetrievalMRS IR (single model) | 62.26 | — | Imported | 2026-05-27 |
| 88 | TAP (single model) | 61.9 | — | Imported | 2026-05-27 |
| 89 | Tree-shaped-cluster (single model) | 61.73 | — | Imported | 2026-05-27 |
| 90 | SAQA (single model) | 61.72 | — | Imported | 2026-05-27 |
| 91 | MKGN (single model) | 61.69 | — | Imported | 2026-05-27 |
| 92 | AFSgraph (single model) | 61.66 | — | Imported | 2026-05-27 |
| 93 | Robustly Fine-tuned Graph-based Recurrent Retriever (single model) | 61.18 | — | Imported | 2026-05-27 |
| 94 | AFSgraph model (single model) | 60.9 | — | Imported | 2026-05-27 |
| 95 | HGN-large + SemanticRetrievalMRS IR (single model) | 60.74 | — | Imported | 2026-05-27 |
| 96 | GRN + BERT (single model) | 60.31 | — | Imported | 2026-05-27 |
| 97 | DPR-recurrent (single model) | 60.23 | — | Imported | 2026-05-27 |
| 98 | RoBERTa-DenseRetriever (single model) | 60.05 | — | Imported | 2026-05-27 |
| 99 | LQR-net + BERT-Base (single model) | 59.99 | — | Imported | 2026-05-27 |
| 100 | HGN + SemanticRetrievalMRS IR (single model) | 59.86 | — | Imported | 2026-05-27 |
| 101 | () | 59.84 | — | Imported | 2026-05-27 |
| 102 | DFGN (single model) | 59.82 | — | Imported | 2026-05-27 |
| 103 | QFE (single model) | 59.61 | — | Imported | 2026-05-27 |
| 104 | IRC (single model) | 59.43 | — | Imported | 2026-05-27 |
| 105 | LQR-net (ensemble) | 58.86 | — | Imported | 2026-05-27 |
| 106 | GRN (single model) | 58.47 | — | Imported | 2026-05-27 |
| 107 | BERT Plus (single model) | 58.23 | — | Imported | 2026-05-27 |
| 108 | DFGN + BERT (single model) | 58.23 | — | Imported | 2026-05-27 |
| 109 | GraphRR-Fast (single model) | 56.85 | — | Imported | 2026-05-27 |
| 110 | DR model (single model) | 56.82 | — | Imported | 2026-05-27 |
| 111 | Quark + SemanticRetrievalMRS IR (single model) | 56.23 | — | Imported | 2026-05-27 |
| 112 | GAR-BERT (single model) | 56.1 | — | Imported | 2026-05-27 |
| 113 | Graph-based Recurrent Retriever (single model) | 55.31 | — | Imported | 2026-05-27 |
| 114 | MIR+EPS+BERT (single model) | 54.75 | — | Imported | 2026-05-27 |
| 115 | GAR (single model) | 52.95 | — | Imported | 2026-05-27 |
| 116 | KGNN (single model) | 52.82 | — | Imported | 2026-05-27 |
| 117 | RoBERTa-L Two-step Model (single model) | 52.5 | — | Imported | 2026-05-27 |
| 118 | Transformer-XH-final(BERT-base) (single model) | 51.29 | — | Imported | 2026-05-27 |
| 119 | Transformer-XH (single model) | 49.57 | — | Imported | 2026-05-27 |
| 120 | SemanticRetrievalMRS (single model) | 47.6 | — | Imported | 2026-05-27 |
| 121 | () | 44.88 | — | Imported | 2026-05-27 |
| 122 | DrKIT (single model) | 42.88 | — | Imported | 2026-05-27 |
| 123 | () | 41.77 | — | Imported | 2026-05-27 |
| 124 | () | 41.42 | — | Imported | 2026-05-27 |
| 125 | GAR-NOSF (single model) | 41.42 | — | Imported | 2026-05-27 |
| 126 | () | 40.89 | — | Imported | 2026-05-27 |
| 127 | Baseline Model (single model) | 40.16 | — | Imported | 2026-05-27 |
| 128 | () | 39.25 | — | Imported | 2026-05-27 |
| 129 | Entity-centric BERT Pipeline (single model) | 39.18 | — | Imported | 2026-05-27 |
| 130 | GoldEn Retriever (single model) | 39.13 | — | Imported | 2026-05-27 |
| 131 | PR-Bert (single model) | 39.11 | — | Imported | 2026-05-27 |
| 132 | SAFSr-Bert (single model) | 37 | — | Imported | 2026-05-27 |
| 133 | Cognitive Graph QA (single model) | 34.92 | — | Imported | 2026-05-27 |
| 134 | GAR-NOSF (single model) | 33.36 | — | Imported | 2026-05-27 |
| 135 | IKFGraph (single model) | 30.38 | — | Imported | 2026-05-27 |
| 136 | () | 29.07 | — | Imported | 2026-05-27 |
| 137 | AnonymousQ (single model) | 29.07 | — | Imported | 2026-05-27 |
| 138 | HGN Model-reproduce (single model) | 28.4 | — | Imported | 2026-05-27 |
| 139 | MUPPET (single model) | 27.01 | — | Imported | 2026-05-27 |
| 140 | GRN + BERT (single model) | 25.84 | — | Imported | 2026-05-27 |
| 141 | Entity-centric IR (single model) | 25.47 | — | Imported | 2026-05-27 |
| 142 | KGNN (single model) | 24.66 | — | Imported | 2026-05-27 |
| 143 | SAQA (single model) | 24.49 | — | Imported | 2026-05-27 |
| 144 | GRN (single model) | 23.55 | — | Imported | 2026-05-27 |
| 145 | QFE (single model) | 23.1 | — | Imported | 2026-05-27 |
| 146 | SAFSr_model (single model) | 20.9 | — | Imported | 2026-05-27 |
| 147 | Baseline Model (single model) | 16.15 | — | Imported | 2026-05-27 |
| 148 | () | 1.11 | — | Imported | 2026-05-27 |
| 149 | () | 0 | — | Imported | 2026-05-27 |
| 150 | graph-recurrent-retriever+roberta-base w. S/R-pretraining (single model) | 0 | — | Imported | 2026-05-27 |
| 151 | Mistral multi hop with very large sources (single model) | 0 | — | Imported | 2026-05-27 |
No matching rows.