CorpusQA 1M
CorpusQA 1M is a long-context question answering benchmark designed to evaluate models at approximately 1 million token contexts. Models are scored on accuracy when retrieving and reasoning over information distributed across an extremely long input corpus.
2rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DeepSeek-V4-Pro-Max | 0.62 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-06 |
| 2 | DeepSeek-V4-Flash-Max | 0.60 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Self-reported | 2026-05-06 |
No matching rows.