MRCR 1M

MRCR 1M is a variant of the Multi-Round Coreference Resolution benchmark designed for testing extremely long context capabilities with approximately 1 million tokens. It evaluates models' ability to maintain reasoning and attention across ultra-long conversations.

3rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 DeepSeek-V4-Pro-Max 0.83 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-06
2 DeepSeek-V4-Flash-Max 0.79 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Self-reported 2026-05-06
3 Gemini 2.0 Flash-Lite 0.58 Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Self-reported 2026-05-06