SQuAD 2.0
SQuAD 2.0: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
205rows
f1primary metric
2026-05-27sampled
Metadata
Metrics
F1, Exact Match
| Rank | Subject | F1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | IE-Net (ensemble) | 93.214% | — | Imported | 2026-05-27 |
| 2 | FPNet (ensemble) | 93.183% | — | Imported | 2026-05-27 |
| 3 | IE-NetV2 (ensemble) | 93.1% | — | Imported | 2026-05-27 |
| 4 | SA-Net on Albert (ensemble) | 93.011% | — | Imported | 2026-05-27 |
| 5 | Retro-Reader (ensemble) | 92.978% | — | Imported | 2026-05-27 |
| 6 | SA-Net-V2 (ensemble) | 92.948% | — | Imported | 2026-05-27 |
| 7 | FPNet (ensemble) | 92.899% | — | Imported | 2026-05-27 |
| 8 | TransNets + SFVerifier + SFEnsembler (ensemble) | 92.894% | — | Imported | 2026-05-27 |
| 9 | ATRLP+PV (ensemble) | 92.877% | — | Imported | 2026-05-27 |
| 10 | EntitySpanFocusV2 (ensemble) | 92.824% | — | Imported | 2026-05-27 |
| 11 | LANetV2 (ensemble) | 92.807% | — | Imported | 2026-05-27 |
| 12 | ALBERT + DAAF + Verifier (ensemble) | 92.777% | — | Imported | 2026-05-27 |
| 13 | MixEnsemble (ensemble) | 92.594% | — | Imported | 2026-05-27 |
| 14 | Retro-Reader on ALBERT (ensemble) | 92.58% | — | Imported | 2026-05-27 |
| 15 | Answer Dependent Classify (single model) | 92.517% | — | Imported | 2026-05-27 |
| 16 | ANet | 92.457% | — | Imported | 2026-05-27 |
| 17 | ALBERT + DAAF + Verifier (ensemble) | 92.425% | — | Imported | 2026-05-27 |
| 18 | LANet (ensemble) | 92.425% | — | Imported | 2026-05-27 |
| 19 | ELECTRA+ATRLP+PV (single model) | 92.366% | — | Imported | 2026-05-27 |
| 20 | Span Extract + Classify (single model) | 92.226% | — | Imported | 2026-05-27 |
| 21 | ALBERT (ensemble model) | 92.215% | — | Imported | 2026-05-27 |
| 22 | Albert_Verifier_AA_Net (ensemble) | 92.18% | — | Imported | 2026-05-27 |
| 23 | albert+KD+transfer (ensemble) | 92.134% | — | Imported | 2026-05-27 |
| 24 | ROaD-Electra | 92.118% | — | Imported | 2026-05-27 |
| 25 | Retro-Reader on ELECTRA (single model) | 92.052% | — | Imported | 2026-05-27 |
| 26 | ELECTRA + ROBERTA + ALBERT (ensemble) | 91.994% | — | Imported | 2026-05-27 |
| 27 | ELECTRA + E-Verifier (ensemble) | 91.985% | — | Imported | 2026-05-27 |
| 28 | 2task (single model) | 91.939% | — | Imported | 2026-05-27 |
| 29 | Deberta | 91.9% | — | Imported | 2026-05-27 |
| 30 | albert+KD+transfer+twopass (single) | 91.877% | — | Imported | 2026-05-27 |
| 31 | ELECTRA+RL+EV (single model) | 91.765% | — | Imported | 2026-05-27 |
| 32 | ALBERT+Entailment DA (ensemble) | 91.745% | — | Imported | 2026-05-27 |
| 33 | ALBERT + MTDA + SFVerifier (ensemble model) | 91.739% | — | Imported | 2026-05-27 |
| 34 | ALBERT + SFVerifier (ensemble model) | 91.666% | — | Imported | 2026-05-27 |
| 35 | AE-TEST | 91.635% | — | Imported | 2026-05-27 |
| 36 | ELECTRA+EntitySpanFocus (Single model) | 91.546% | — | Imported | 2026-05-27 |
| 37 | SA-Net on Electra (single model) | 91.486% | — | Imported | 2026-05-27 |
| 38 | Retro-Reader on ALBERT (single model) | 91.419% | — | Imported | 2026-05-27 |
| 39 | ELECTRA (single model) | 91.365% | — | Imported | 2026-05-27 |
| 40 | ELECTRA_ATT (single model) | 91.303% | — | Imported | 2026-05-27 |
| 41 | Deberta+prefix | 91.299% | — | Imported | 2026-05-27 |
| 42 | ALBERT + IG + NE (single model) | 91.287% | — | Imported | 2026-05-27 |
| 43 | ALBERT (Single model) | 91.286% | — | Imported | 2026-05-27 |
| 44 | ALBERT+Entailment DA Verifier (single model) | 91.265% | — | Imported | 2026-05-27 |
| 45 | ALBERT + IG (single model) | 91.256% | — | Imported | 2026-05-27 |
| 46 | Tuned ALBERT (ensemble model) | 91.23% | — | Imported | 2026-05-27 |
| 47 | SkERT-Large (single model) | 90.944% | — | Imported | 2026-05-27 |
| 48 | aanet_v2.0 (single model) | 90.918% | — | Imported | 2026-05-27 |
| 49 | ALBERT (single model) | 90.902% | — | Imported | 2026-05-27 |
| 50 | MTL (single model) | 90.902% | — | Imported | 2026-05-27 |
| 51 | albert_xxlarge (single model) | 90.872% | — | Imported | 2026-05-27 |
| 52 | XLNet + DAAF + Verifier (ensemble) | 90.859% | — | Imported | 2026-05-27 |
| 53 | ALBERT + SFVerifier (single model) | 90.83% | — | Imported | 2026-05-27 |
| 54 | ALBERT+RL (single model) | 90.823% | — | Imported | 2026-05-27 |
| 55 | albert+KD+transfer+twopass (single) | 90.818% | — | Imported | 2026-05-27 |
| 56 | UPM (ensemble) | 90.713% | — | Imported | 2026-05-27 |
| 57 | XLNet + SG-Net Verifier (ensemble) | 90.702% | — | Imported | 2026-05-27 |
| 58 | XLNet (single model) | 90.689% | — | Imported | 2026-05-27 |
| 59 | ALBERT 1.1 (single model) | 90.588% | — | Imported | 2026-05-27 |
| 60 | Tuned ALBERT (single model) | 90.532% | — | Imported | 2026-05-27 |
| 61 | LUKE (single model) | 90.163% | — | Imported | 2026-05-27 |
| 62 | XLNet + SG-Net Verifier++ (single model) | 90.071% | — | Imported | 2026-05-27 |
| 63 | RoBERTa+Verify (ensemble) | 90.037% | — | Imported | 2026-05-27 |
| 64 | UPM (single model) | 89.934% | — | Imported | 2026-05-27 |
| 65 | RoBERTa (single model) | 89.795% | — | Imported | 2026-05-27 |
| 66 | Enhanced Albert+Verifier3 (ensemble) | 89.778% | — | Imported | 2026-05-27 |
| 67 | Enhanced Albert+Verifier (ensemble) | 89.634% | — | Imported | 2026-05-27 |
| 68 | RoBERTa+Verify (single model) | 89.586% | — | Imported | 2026-05-27 |
| 69 | BERT + DAE + AoA (ensemble) | 89.474% | — | Imported | 2026-05-27 |
| 70 | (Rajpurkar & Jia et al. '18) | 89.452% | — | Imported | 2026-05-27 |
| 71 | BERT + ConvLSTM + MTL + Verifier (ensemble) | 89.286% | — | Imported | 2026-05-27 |
| 72 | XLNET-V2-123+ (single model) | 89.148% | — | Imported | 2026-05-27 |
| 73 | BERT + N-Gram Masking + Synthetic Self-Training (ensemble) | 89.147% | — | Imported | 2026-05-27 |
| 74 | XLNet (single model) | 89.133% | — | Imported | 2026-05-27 |
| 75 | Xlnet+Verifier | 89.082% | — | Imported | 2026-05-27 |
| 76 | Xlnet+Verifier (single model) | 89.063% | — | Imported | 2026-05-27 |
| 77 | BERTSP (single model) | 88.921% | — | Imported | 2026-05-27 |
| 78 | SemBERT (ensemble) | 88.886% | — | Imported | 2026-05-27 |
| 79 | SG-Net (ensemble) | 88.848% | — | Imported | 2026-05-27 |
| 80 | RoBERTa-Large (ensemble model) | 88.793% | — | Imported | 2026-05-27 |
| 81 | SpanBERT (single model) | 88.709% | — | Imported | 2026-05-27 |
| 82 | BERT + DAE + AoA (single model) | 88.621% | — | Imported | 2026-05-27 |
| 83 | RoBERTa-Large (single model) | 88.425% | — | Imported | 2026-05-27 |
| 84 | BERT + ConvLSTM + MTL + Verifier (single model) | 88.204% | — | Imported | 2026-05-27 |
| 85 | xlnet (single model) | 88% | — | Imported | 2026-05-27 |
| 86 | SG-Net (single model) | 87.926% | — | Imported | 2026-05-27 |
| 87 | SemBERT (single model) | 87.864% | — | Imported | 2026-05-27 |
| 88 | BNDVnet (single model) | 87.833% | — | Imported | 2026-05-27 |
| 89 | BERT + N-Gram Masking + Synthetic Self-Training (single model) | 87.715% | — | Imported | 2026-05-27 |
| 90 | Insight-baseline-BERT (single model) | 87.644% | — | Imported | 2026-05-27 |
| 91 | BERT + MMFT + ADA (ensemble) | 87.615% | — | Imported | 2026-05-27 |
| 92 | Hanvon_model (single model) | 87.117% | — | Imported | 2026-05-27 |
| 93 | RoberTa+Parallel+Adapters (single model) | 87.013% | — | Imported | 2026-05-27 |
| 94 | BERT + Synthetic Self-Training (ensemble) | 86.967% | — | Imported | 2026-05-27 |
| 95 | BERT + Multiple-CNN (ensemble) | 86.767% | — | Imported | 2026-05-27 |
| 96 | SemNet (single model) | 86.669% | — | Imported | 2026-05-27 |
| 97 | Tuned BERT-1seq Large Cased (single model) | 86.594% | — | Imported | 2026-05-27 |
| 98 | SynNet (single model) | 86.222% | — | Imported | 2026-05-27 |
| 99 | PAML+BERT (ensemble model) | 86.122% | — | Imported | 2026-05-27 |
| 100 | BERT finetune baseline (ensemble) | 86.096% | — | Imported | 2026-05-27 |
| 101 | Lunet + Verifier + BERT (ensemble) | 86.043% | — | Imported | 2026-05-27 |
| 102 | Bert-raw (ensemble) | 86.036% | — | Imported | 2026-05-27 |
| 103 | Lunet + Verifier + BERT (single model) | 86.035% | — | Imported | 2026-05-27 |
| 104 | ATB (single model) | 86.002% | — | Imported | 2026-05-27 |
| 105 | BERT + MMFT + ADA (single model) | 85.892% | — | Imported | 2026-05-27 |
| 106 | SENSEFORTH + BERT | 85.873% | — | Imported | 2026-05-27 |
| 107 | Tuned BERT Large Cased (single model) | 85.863% | — | Imported | 2026-05-27 |
| 108 | BERT + Synthetic Self-Training (single model) | 85.81% | — | Imported | 2026-05-27 |
| 109 | BERT with Something (ensemble) | 85.737% | — | Imported | 2026-05-27 |
| 110 | BERT + NeurQuRI (ensemble) | 85.703% | — | Imported | 2026-05-27 |
| 111 | BART + Adapters + Lohfink-Rossi-Leaveout (single-model) | 85.67% | — | Imported | 2026-05-27 |
| 112 | Bert-raw (ensemble) | 85.635% | — | Imported | 2026-05-27 |
| 113 | PAML+BERT (single model) | 85.603% | — | Imported | 2026-05-27 |
| 114 | BERT + NeurQuRI (ensemble) | 85.584% | — | Imported | 2026-05-27 |
| 115 | Bert-raw (ensemble) | 85.51% | — | Imported | 2026-05-27 |
| 116 | BERT-Base + QA Pre-training (single model) | 85.491% | — | Imported | 2026-05-27 |
| 117 | AoA + DA + BERT (ensemble) | 85.31% | — | Imported | 2026-05-27 |
| 118 | BERT-Base PMI-Masking Additional Data (single model) | 84.854% | — | Imported | 2026-05-27 |
| 119 | BERT_s (single model) | 84.846% | — | Imported | 2026-05-27 |
| 120 | BERT finetune baseline (single model) | 84.82% | — | Imported | 2026-05-27 |
| 121 | BERT-large+UBFT (single model) | 84.535% | — | Imported | 2026-05-27 |
| 122 | BERT with Something (single model) | 84.386% | — | Imported | 2026-05-27 |
| 123 | BERT + NeurQuRI (single model) | 84.342% | — | Imported | 2026-05-27 |
| 124 | AoA + DA + BERT (single model) | 84.251% | — | Imported | 2026-05-27 |
| 125 | Bert-raw (single) | 83.922% | — | Imported | 2026-05-27 |
| 126 | BERT + UnAnsQ (single model) | 83.851% | — | Imported | 2026-05-27 |
| 127 | BERT-Base PMI-Masking (single model) | 83.604% | — | Imported | 2026-05-27 |
| 128 | Bert-raw (single) | 83.457% | — | Imported | 2026-05-27 |
| 129 | BERT + NeurQuRI (single model) | 83.391% | — | Imported | 2026-05-27 |
| 130 | Original BERT Large Cased (single model) | 83.266% | — | Imported | 2026-05-27 |
| 131 | PMI-Masking Additional Data Random Baseline (single model) | 83.262% | — | Imported | 2026-05-27 |
| 132 | Bert-raw (single model) | 83.243% | — | Imported | 2026-05-27 |
| 133 | BERT + UDA (single model) | 83.208% | — | Imported | 2026-05-27 |
| 134 | PwP+BERT (single model) | 83.189% | — | Imported | 2026-05-27 |
| 135 | bert (single model) | 83.184% | — | Imported | 2026-05-27 |
| 136 | PMI-Masking Pure-PMI (single model) | 83.175% | — | Imported | 2026-05-27 |
| 137 | BISAN-CC (single model) | 83.149% | — | Imported | 2026-05-27 |
| 138 | Bert | 83.118% | — | Imported | 2026-05-27 |
| 139 | BERT (single model) | 83.061% | — | Imported | 2026-05-27 |
| 140 | PMI-Masking Additional Data Pure-PMI (single model) | 83.039% | — | Imported | 2026-05-27 |
| 141 | BERT + Sparse-Transformer | 83.023% | — | Imported | 2026-05-27 |
| 142 | BERT uncased (single model) | 83.02% | — | Imported | 2026-05-27 |
| 143 | ST_bl | 82.962% | — | Imported | 2026-05-27 |
| 144 | NEXYS_BASE (single model) | 82.912% | — | Imported | 2026-05-27 |
| 145 | {bert-finetuning} (single model) | 82.852% | — | Imported | 2026-05-27 |
| 146 | PMI-Masking Random Baseline (single model) | 82.796% | — | Imported | 2026-05-27 |
| 147 | BERT-Large-Cased | 82.692% | — | Imported | 2026-05-27 |
| 148 | {Anonymous} (single model) | 82.524% | — | Imported | 2026-05-27 |
| 149 | L6Net + BERT (single model) | 82.259% | — | Imported | 2026-05-27 |
| 150 | RoberTa+Fusion+Adapters (single model) | 81.863% | — | Imported | 2026-05-27 |
| 151 | BISAN (single model) | 81.531% | — | Imported | 2026-05-27 |
| 152 | BERT-Large-Cased | 81.5% | — | Imported | 2026-05-27 |
| 153 | BERT + WIAN (ensemble) | 81.497% | — | Imported | 2026-05-27 |
| 154 | AMBERT (single model) | 81.445% | — | Imported | 2026-05-27 |
| 155 | BERT+AC (single model) | 81.174% | — | Imported | 2026-05-27 |
| 156 | BERT (single model) | 80.31% | — | Imported | 2026-05-27 |
| 157 | RoberTa+Adapter (single model) | 80.258% | — | Imported | 2026-05-27 |
| 158 | SLQA+BERT (single model) | 80.209% | — | Imported | 2026-05-27 |
| 159 | AMBERT-S (single model) | 79.776% | — | Imported | 2026-05-27 |
| 160 | AMBERT-H (single model) | 79.659% | — | Imported | 2026-05-27 |
| 161 | synss (single model) | 79.329% | — | Imported | 2026-05-27 |
| 162 | mgrc | 78.381% | — | Imported | 2026-05-27 |
| 163 | BERT-Base-L (single model) | 78.232% | — | Imported | 2026-05-27 |
| 164 | ARSG-BERT (single model) | 78.227% | — | Imported | 2026-05-27 |
| 165 | MIR-MRC(F-Net) (single model) | 77.988% | — | Imported | 2026-05-27 |
| 166 | BERT-Base-V (single model) | 77.805% | — | Imported | 2026-05-27 |
| 167 | BERT-Base-DT (single model) | 77.706% | — | Imported | 2026-05-27 |
| 168 | BERT-Base-DP (single model) | 77.464% | — | Imported | 2026-05-27 |
| 169 | BERT-Base-V2 | 77.404% | — | Imported | 2026-05-27 |
| 170 | BERT-Base-Add (single model) | 77.396% | — | Imported | 2026-05-27 |
| 171 | {BERTcw} (single model) | 77.308% | — | Imported | 2026-05-27 |
| 172 | nlnet (single model) | 77.052% | — | Imported | 2026-05-27 |
| 173 | batch2 (single model) | 76.858% | — | Imported | 2026-05-27 |
| 174 | MMIPN | 76.424% | — | Imported | 2026-05-27 |
| 175 | BERT-Base-Baseline (single model) | 76.284% | — | Imported | 2026-05-27 |
| 176 | BERT-Base (single model) | 76.236% | — | Imported | 2026-05-27 |
| 177 | BERT-base | 75.513% | — | Imported | 2026-05-27 |
| 178 | BERTBase (single model) | 75.513% | — | Imported | 2026-05-27 |
| 179 | YARCS (ensemble) | 75.507% | — | Imported | 2026-05-27 |
| 180 | BERT+Answer Verifier (single model) | 75.457% | — | Imported | 2026-05-27 |
| 181 | Unet (ensemble) | 74.869% | — | Imported | 2026-05-27 |
| 182 | HYDRA_BERT (single model) | 74.578% | — | Imported | 2026-05-27 |
| 183 | {BERT-base} (single-model) | 74.449% | — | Imported | 2026-05-27 |
| 184 | SLQA+ (single model) | 74.434% | — | Imported | 2026-05-27 |
| 185 | BERT-Base (single) | 74.43% | — | Imported | 2026-05-27 |
| 186 | Reinforced Mnemonic Reader + Answer Verifier (single model) | 74.295% | — | Imported | 2026-05-27 |
| 187 | SAN (ensemble model) | 73.704% | — | Imported | 2026-05-27 |
| 188 | Multi-Level Attention Fusion(MLAF) (single model) | 72.857% | — | Imported | 2026-05-27 |
| 189 | Unet (single model) | 72.642% | — | Imported | 2026-05-27 |
| 190 | FusionNet++ (ensemble) | 72.484% | — | Imported | 2026-05-27 |
| 191 | DocQA + NeurQuRI (single model) | 71.662% | — | Imported | 2026-05-27 |
| 192 | BiDAF++ with pair2vec (single model) | 71.583% | — | Imported | 2026-05-27 |
| 193 | SAN (single model) | 71.439% | — | Imported | 2026-05-27 |
| 194 | VS^3-NET (single model) | 70.884% | — | Imported | 2026-05-27 |
| 195 | KACTEIL-MRC(GFN-Net) (single model) | 70.878% | — | Imported | 2026-05-27 |
| 196 | EBB-Net (single model) | 70.303% | — | Imported | 2026-05-27 |
| 197 | KakaoNet2 (single model) | 69.381% | — | Imported | 2026-05-27 |
| 198 | abcNet (single model) | 69.206% | — | Imported | 2026-05-27 |
| 199 | BiDAF++ (single model) | 68.866% | — | Imported | 2026-05-27 |
| 200 | BSAE AddText (single model) | 67.422% | — | Imported | 2026-05-27 |
| 201 | eeAttNet (single model) | 66.633% | — | Imported | 2026-05-27 |
| 202 | BiDAF + Self Attention + ELMo (single model) | 66.251% | — | Imported | 2026-05-27 |
| 203 | Tree-LSTM + BiDAF + ELMo (single model) | 62.341% | — | Imported | 2026-05-27 |
| 204 | BiDAF + Self Attention (single model) | 62.305% | — | Imported | 2026-05-27 |
| 205 | BiDAF-No-Answer (single model) | 62.093% | — | Imported | 2026-05-27 |
No matching rows.