DeepResearch Bench
Benchmark for deep research agents that evaluates generated research reports across comprehensiveness, insight, instruction following, readability, and citation dimensions.
41rows
overall_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Score, Comprehensiveness, Insight, Instruction Following, Readability, Citation Accuracy, Effective Citations
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | xiaoyi | 57 | — | Imported | 2026-05-06 |
| 2 | cellcog-max | 56.67 | — | Imported | 2026-05-06 |
| 3 | grep-v5 | 56.23 | — | Imported | 2026-05-06 |
| 4 | Link | 56.04 | — | Imported | 2026-05-06 |
| 5 | nvidia-aiq-nemotron-gpt52-updated | 55.95 | — | Imported | 2026-05-06 |
| 6 | 1688AILab-DeepResearch-0325 | 55.39 | — | Imported | 2026-05-06 |
| 7 | ms_deepresearch_gpt52mixqwen35_09_edit_restart09_think_medium | 55.31 | — | Imported | 2026-05-06 |
| 8 | drb_cellcog | 55.31 | — | Imported | 2026-05-06 |
| 9 | deepinsight | 55.24 | — | Imported | 2026-05-06 |
| 10 | ms_deepresearch | 54.97 | — | Imported | 2026-05-06 |
| 11 | TrajectoryKit | 54.92 | — | Imported | 2026-05-06 |
| 12 | onyx | 54.54 | — | Imported | 2026-05-06 |
| 13 | baidu-qianfan-drs-pro | 54.22 | — | Imported | 2026-05-06 |
| 14 | deepsynth | 54.22 | — | Imported | 2026-05-06 |
| 15 | deepdog | 53.52 | — | Imported | 2026-05-06 |
| 16 | RecallRadar | 53.19 | — | Imported | 2026-05-06 |
| 17 | baidu-qianfan-drs | 53.02 | — | Imported | 2026-05-06 |
| 18 | MindDR-V1.5 | 52.54 | — | Imported | 2026-05-06 |
| 19 | tavily-research | 52.44 | — | Imported | 2026-05-06 |
| 20 | thinkdepthai-deepresearch | 52.43 | — | Imported | 2026-05-06 |
| 21 | salesforce-air-deep-research | 50.65 | — | Imported | 2026-05-06 |
| 22 | gensee-search-gpt-5 | 50.60 | — | Imported | 2026-05-06 |
| 23 | gemini-2.5-pro-deepresearch | 49.71 | — | Imported | 2026-05-06 |
| 24 | langchain-open-deep-research-gpt-5 | 49.33 | — | Imported | 2026-05-06 |
| 25 | openai-deepresearch | 46.45 | — | Imported | 2026-05-06 |
| 26 | raaa-deep-research | 46.13 | — | Imported | 2026-05-06 |
| 27 | dr-tulu | 45.49 | — | Imported | 2026-05-06 |
| 28 | claude-research | 45 | — | Imported | 2026-05-06 |
| 29 | kimi-researcher | 44.64 | — | Imported | 2026-05-06 |
| 30 | doubao-deepresearch | 44.34 | — | Imported | 2026-05-06 |
| 31 | langchain-open-deep-research | 43.44 | — | Imported | 2026-05-06 |
| 32 | nvidia-aiq-research-assistant | 40.52 | — | Imported | 2026-05-06 |
| 33 | tongyi-deepresearch-30B-A3B | 40.46 | — | Imported | 2026-05-06 |
| 34 | perplexity-Research | 40.46 | — | Imported | 2026-05-06 |
| 35 | grok-deeper-search | 38.22 | — | Imported | 2026-05-06 |
| 36 | sonar-reasoning-pro | 37.76 | — | Imported | 2026-05-06 |
| 37 | sonar-reasoning | 37.75 | — | Imported | 2026-05-06 |
| 38 | claude-3-7-sonnet-with-search | 36.63 | — | Imported | 2026-05-06 |
| 39 | sonar-pro | 36.19 | — | Imported | 2026-05-06 |
| 40 | gemini-2.5-pro-preview-05-06 | 31.90 | — | Imported | 2026-05-06 |
| 41 | gpt-4o-search-preview | 30.74 | — | Imported | 2026-05-06 |
No matching rows.