MRCR 1M
MRCR 1M is a variant of the Multi-Round Coreference Resolution benchmark designed for testing extremely long context capabilities with approximately 1 million tokens. It evaluates models' ability to maintain reasoning and attention across ultra-long conversations.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DeepSeek-V4-Pro-Max | 0.83 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-06 |
| 2 | DeepSeek-V4-Flash-Max | 0.79 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Self-reported | 2026-05-06 |
| 3 | Gemini 2.0 Flash-Lite | 0.58 | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Self-reported | 2026-05-06 |
No matching rows.