SearchAgent Leaderboard

Standardized leaderboard for search-augmented question-answering agents across general QA, multi-hop QA, and the closed-world FictionalHot benchmark.

12rows
average_exact_matchprimary metric
2026-05-06sampled

Metadata

Metrics

Average EM, NQ EM, TriviaQA EM, PopQA EM, HotpotQA EM, 2Wiki EM, Musique EM, Bamboogle EM, FictionalHot EM

Latest Results

Rows are parsed from public SearchAgent eval-results JSON files. Source method/model display names are preserved, and decimal exact-match scores are converted to percentages.

Rank Subject Average EM Model Match Provenance Sampled
1 ReSeek-Qwen2.5-7b-Instruct 37.74 Imported 2026-05-06
2 ZeroSearch-Qwen2.5-7b-Instruct 34.59 Imported 2026-05-06
3 Search-R1-Qwen2.5-7b-Instruct 34.15 Imported 2026-05-06
4 ReSeek-Qwen2.5-3b-Instruct 31.18 Imported 2026-05-06
5 Search-R1-Qwen2.5-3b-Instruct 28.89 Imported 2026-05-06
6 ZeroSearch-Qwen2.5-3b-Instruct 28.11 Imported 2026-05-06
7 RAG-Qwen2.5-7b-Instruct 26.73 Imported 2026-05-06
8 R1-Qwen2.5-7b-Instruct 23.79 Imported 2026-05-06
9 Search-o1-Qwen2.5-7b-Instruct 18.27 Imported 2026-05-06
10 SFT-Qwen2.5-7b-Instruct 18.13 Imported 2026-05-06
11 Direct-Inference-Qwen2.5-7b-Instruct 15.84 Imported 2026-05-06
12 CoT-Qwen2.5-7b-Instruct 9.31 Imported 2026-05-06