BrowseComp Long Context 128k
A challenging benchmark for evaluating web browsing agents' ability to persistently navigate the internet and find hard-to-locate, entangled information. Comprises 1,266 questions requiring strategic reasoning, creative search, and interpretation of retrieved content, with short and easily verifiable answers.
5rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 | 0.92 | GPT-5.2 openai-gpt-5.2 | Self-reported | 2026-05-06 |
| 2 | GPT-5.1 | 0.90 | GPT-5.1 openai-gpt-5.1 | Self-reported | 2026-05-06 |
| 2 | GPT-5.1 Instant | 0.90 | GPT-5.1 openai-gpt-5.1 | Self-reported | 2026-05-06 |
| 2 | GPT-5.1 Thinking | 0.90 | GPT-5.1 openai-gpt-5.1 | Self-reported | 2026-05-06 |
| 2 | GPT-5 | 0.90 | GPT-5 openai-gpt-5 | Self-reported | 2026-05-06 |
No matching rows.