MedBrowseComp

Medical browsing and search benchmark for multi-hop clinical research questions over live or web-grounded medical sources.

10rows
real_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Real Accuracy, Correct Count, Question Count

Latest Results

Rows aggregated from public per-run CSV files in the MedBrowseComp repository. Real accuracy excludes rows whose correct answer is NA-like, following the repository summary script.

Rank Subject Real Accuracy Model Match Provenance Sampled
1 perplexity_deep_research 14/48 Imported 2026-05-27
2 Gemini Pro 12/48 Imported 2026-05-27
3 Gemini Pro + tools 111/453 Imported 2026-05-27
4 perplexity_sonar_pro 9/48 Imported 2026-05-27
5 Gemini 2.0 Flash + tools 75/453 Imported 2026-05-27
6 Sonar Pro 74/453 Imported 2026-05-27
7 Gemini Pro (param) 4/48 Imported 2026-05-27
8 Gemini 2.0 Flash 31/453 Imported 2026-05-27
9 Gemini Pro (param) 23/453 Imported 2026-05-27
10 GPT-4.1 + tools 19/453 Imported 2026-05-27