CorpusQA 1M

CorpusQA 1M is a long-context question answering benchmark designed to evaluate models at approximately 1 million token contexts. Models are scored on accuracy when retrieving and reasoning over information distributed across an extremely long input corpus.

2rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 DeepSeek-V4-Pro-Max 0.62 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-06
2 DeepSeek-V4-Flash-Max 0.60 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Self-reported 2026-05-06