SearchAgent Leaderboard
Standardized leaderboard for search-augmented question-answering agents across general QA, multi-hop QA, and the closed-world FictionalHot benchmark.
12rows
average_exact_matchprimary metric
2026-05-06sampled
Metadata
Metrics
Average EM, NQ EM, TriviaQA EM, PopQA EM, HotpotQA EM, 2Wiki EM, Musique EM, Bamboogle EM, FictionalHot EM
| Rank | Subject | Average EM | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ReSeek-Qwen2.5-7b-Instruct | 37.74 | — | Imported | 2026-05-06 |
| 2 | ZeroSearch-Qwen2.5-7b-Instruct | 34.59 | — | Imported | 2026-05-06 |
| 3 | Search-R1-Qwen2.5-7b-Instruct | 34.15 | — | Imported | 2026-05-06 |
| 4 | ReSeek-Qwen2.5-3b-Instruct | 31.18 | — | Imported | 2026-05-06 |
| 5 | Search-R1-Qwen2.5-3b-Instruct | 28.89 | — | Imported | 2026-05-06 |
| 6 | ZeroSearch-Qwen2.5-3b-Instruct | 28.11 | — | Imported | 2026-05-06 |
| 7 | RAG-Qwen2.5-7b-Instruct | 26.73 | — | Imported | 2026-05-06 |
| 8 | R1-Qwen2.5-7b-Instruct | 23.79 | — | Imported | 2026-05-06 |
| 9 | Search-o1-Qwen2.5-7b-Instruct | 18.27 | — | Imported | 2026-05-06 |
| 10 | SFT-Qwen2.5-7b-Instruct | 18.13 | — | Imported | 2026-05-06 |
| 11 | Direct-Inference-Qwen2.5-7b-Instruct | 15.84 | — | Imported | 2026-05-06 |
| 12 | CoT-Qwen2.5-7b-Instruct | 9.31 | — | Imported | 2026-05-06 |
No matching rows.