BrowseComp+

Deep-research benchmark variant in Exgentic's Open Agent Leaderboard, evaluating general-purpose agents on BrowseComp+ web research tasks without domain-specific tuning.

15rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Finished, Average Cost (lower is better), Average Steps (lower is better), Total Cost (lower is better), Tasks

Latest Results

Rows ranked by highest score.

Rank Subject Score Model Match Provenance Sampled
1 OpenAI_Solo / claude-opus-4.5 0.61 Imported 2026-05-06
2 Smolagent / claude-opus-4.5 0.61 Imported 2026-05-06
3 Smolagent / gemini-3-pro 0.57 Imported 2026-05-06
4 Claude_Code / claude-opus-4.5 0.53 Imported 2026-05-06
5 Claude_Code / gemini-3-pro 0.51 Imported 2026-05-06
6 React / claude-opus-4.5 0.49 Imported 2026-05-06
7 React_+_Shortlisting / claude-opus-4.5 0.49 Imported 2026-05-06
8 OpenAI_Solo / gpt-5.2 0.48 Imported 2026-05-06
9 React / gemini-3-pro 0.48 Imported 2026-05-06
10 React_+_Shortlisting / gemini-3-pro 0.48 Imported 2026-05-06
11 React / gpt-5.2 0.46 Imported 2026-05-06
12 React_+_Shortlisting / gpt-5.2 0.46 Imported 2026-05-06
13 Claude_Code / gpt-5.2 0.43 Imported 2026-05-06
14 OpenAI_Solo / gemini-3-pro 0.33 Imported 2026-05-06
15 Smolagent / gpt-5.2 0.26 Imported 2026-05-06