DeepResearch Bench

Benchmark for deep research agents that evaluates generated research reports across comprehensiveness, insight, instruction following, readability, and citation dimensions.

41rows
overall_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Score, Comprehensiveness, Insight, Instruction Following, Readability, Citation Accuracy, Effective Citations

Latest Results

Rows ranked by overall_score. Citation metrics are included when present in the source CSV.

Rank Subject Overall Score Model Match Provenance Sampled
1 xiaoyi 57 Imported 2026-05-06
2 cellcog-max 56.67 Imported 2026-05-06
3 grep-v5 56.23 Imported 2026-05-06
4 Link 56.04 Imported 2026-05-06
5 nvidia-aiq-nemotron-gpt52-updated 55.95 Imported 2026-05-06
6 1688AILab-DeepResearch-0325 55.39 Imported 2026-05-06
7 ms_deepresearch_gpt52mixqwen35_09_edit_restart09_think_medium 55.31 Imported 2026-05-06
8 drb_cellcog 55.31 Imported 2026-05-06
9 deepinsight 55.24 Imported 2026-05-06
10 ms_deepresearch 54.97 Imported 2026-05-06
11 TrajectoryKit 54.92 Imported 2026-05-06
12 onyx 54.54 Imported 2026-05-06
13 baidu-qianfan-drs-pro 54.22 Imported 2026-05-06
14 deepsynth 54.22 Imported 2026-05-06
15 deepdog 53.52 Imported 2026-05-06
16 RecallRadar 53.19 Imported 2026-05-06
17 baidu-qianfan-drs 53.02 Imported 2026-05-06
18 MindDR-V1.5 52.54 Imported 2026-05-06
19 tavily-research 52.44 Imported 2026-05-06
20 thinkdepthai-deepresearch 52.43 Imported 2026-05-06
21 salesforce-air-deep-research 50.65 Imported 2026-05-06
22 gensee-search-gpt-5 50.60 Imported 2026-05-06
23 gemini-2.5-pro-deepresearch 49.71 Imported 2026-05-06
24 langchain-open-deep-research-gpt-5 49.33 Imported 2026-05-06
25 openai-deepresearch 46.45 Imported 2026-05-06
26 raaa-deep-research 46.13 Imported 2026-05-06
27 dr-tulu 45.49 Imported 2026-05-06
28 claude-research 45 Imported 2026-05-06
29 kimi-researcher 44.64 Imported 2026-05-06
30 doubao-deepresearch 44.34 Imported 2026-05-06
31 langchain-open-deep-research 43.44 Imported 2026-05-06
32 nvidia-aiq-research-assistant 40.52 Imported 2026-05-06
33 tongyi-deepresearch-30B-A3B 40.46 Imported 2026-05-06
34 perplexity-Research 40.46 Imported 2026-05-06
35 grok-deeper-search 38.22 Imported 2026-05-06
36 sonar-reasoning-pro 37.76 Imported 2026-05-06
37 sonar-reasoning 37.75 Imported 2026-05-06
38 claude-3-7-sonnet-with-search 36.63 Imported 2026-05-06
39 sonar-pro 36.19 Imported 2026-05-06
40 gemini-2.5-pro-preview-05-06 31.90 Imported 2026-05-06
41 gpt-4o-search-preview 30.74 Imported 2026-05-06