BrowseComp+
Deep-research benchmark variant in Exgentic's Open Agent Leaderboard, evaluating general-purpose agents on BrowseComp+ web research tasks without domain-specific tuning.
15rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Finished, Average Cost (lower is better), Average Steps (lower is better), Total Cost (lower is better), Tasks
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | OpenAI_Solo / claude-opus-4.5 | 0.61 | — | Imported | 2026-05-06 |
| 2 | Smolagent / claude-opus-4.5 | 0.61 | — | Imported | 2026-05-06 |
| 3 | Smolagent / gemini-3-pro | 0.57 | — | Imported | 2026-05-06 |
| 4 | Claude_Code / claude-opus-4.5 | 0.53 | — | Imported | 2026-05-06 |
| 5 | Claude_Code / gemini-3-pro | 0.51 | — | Imported | 2026-05-06 |
| 6 | React / claude-opus-4.5 | 0.49 | — | Imported | 2026-05-06 |
| 7 | React_+_Shortlisting / claude-opus-4.5 | 0.49 | — | Imported | 2026-05-06 |
| 8 | OpenAI_Solo / gpt-5.2 | 0.48 | — | Imported | 2026-05-06 |
| 9 | React / gemini-3-pro | 0.48 | — | Imported | 2026-05-06 |
| 10 | React_+_Shortlisting / gemini-3-pro | 0.48 | — | Imported | 2026-05-06 |
| 11 | React / gpt-5.2 | 0.46 | — | Imported | 2026-05-06 |
| 12 | React_+_Shortlisting / gpt-5.2 | 0.46 | — | Imported | 2026-05-06 |
| 13 | Claude_Code / gpt-5.2 | 0.43 | — | Imported | 2026-05-06 |
| 14 | OpenAI_Solo / gemini-3-pro | 0.33 | — | Imported | 2026-05-06 |
| 15 | Smolagent / gpt-5.2 | 0.26 | — | Imported | 2026-05-06 |
No matching rows.