BrowseComp Long Context 256k

BrowseComp is a benchmark for measuring the ability of agents to browse the web, comprising 1,266 questions that require persistently navigating the internet in search of hard-to-find, entangled information. Despite the difficulty of the questions, BrowseComp is simple and easy-to-use, as predicted answers are short and easily verifiable against reference answers. The benchmark focuses on questions where answers are obscure, time-invariant, and well-supported by evidence scattered across the open web.

2rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: browsecomp_long_256k
Category: Search
Release: Unknown
Source: Source page
Snapshot: Snapshot source

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	GPT-5.2	0.90	GPT-5.2 openai-gpt-5.2	Self-reported	2026-05-06
2	GPT-5	0.89	GPT-5 openai-gpt-5	Self-reported	2026-05-06

Metadata

Metrics

Latest Results