Context Arena
Long-context benchmark leaderboard for multi-needle retrieval and reasoning across increasing context lengths, reported as GDM-MRCRv2 scores.
70rows
avg_score_overallprimary metric
2026-05-06sampled
Metadata
Metrics
Average Score, AUC 128K, AUC 1M, Cumulative Average 128K, Cumulative Average 1M, Average Token Efficiency, Total Runs
| Rank | Subject | Average Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | openai/gpt-5.5 | 79.77 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 2 | openai/gpt-5.5 | 78.96 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 3 | openai/gpt-5.5 | 78.59 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 4 | openai/gpt-5.5 | 75.03 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 5 | anthropic/claude-opus-4.6 | 73.06 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 6 | anthropic/claude-opus-4.6 | 72.43 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 7 | anthropic/claude-opus-4.6 | 72.26 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 8 | anthropic/claude-sonnet-4.6 | 70.50 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 9 | anthropic/claude-sonnet-4.6 | 70.38 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 10 | anthropic/claude-sonnet-4.6 | 69.61 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 11 | openai/gpt-5.4 | 67.65 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 12 | openai/gpt-5.4 | 66.15 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 13 | moonshotai/kimi-k2.6 | 64.63 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 14 | openai/gpt-5.4 | 62.89 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 15 | z-ai/glm-5.1 | 62.05 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 16 | openai/gpt-5.4 | 59.32 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 17 | moonshotai/kimi-k2.5 | 59.22 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 18 | deepseek/deepseek-v4-pro | 55.99 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 19 | google/gemini-3.1-pro-preview | 53.84 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 20 | moonshotai/kimi-k2.5 | 53.33 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 21 | moonshotai/kimi-k2.6 | 51.88 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 22 | deepseek/deepseek-v4-flash | 50.93 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 23 | nvidia/nemotron-3-super-120b-a12b | 50.29 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-06 |
| 24 | openai/gpt-5.4-nano | 48.78 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 25 | google/gemini-3.1-pro-preview | 48.69 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 26 | anthropic/claude-opus-4.6 | 48.19 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 27 | google/gemini-3-flash-preview | 46.79 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 28 | anthropic/claude-sonnet-4.6 | 46.73 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 29 | google/gemini-3-flash-preview | 46.24 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 30 | openai/gpt-5.4-mini | 45.67 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 31 | openai/gpt-5.4-mini | 44.79 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 32 | openai/gpt-5.4-mini | 42.08 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 33 | google/gemini-3-flash-preview | 39.58 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 34 | xiaomi/mimo-v2-omni | 37.85 | MiMo-V2-Omni xiaomi-mimo-v2-omni | Imported | 2026-05-06 |
| 35 | anthropic/claude-haiku-4.5 | 36.27 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 36 | xiaomi/mimo-v2.5-pro | 36.22 | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-06 |
| 37 | openai/gpt-5.4-nano | 36.17 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 38 | google/gemini-3.1-flash-lite-preview | 34.90 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-06 |
| 39 | openai/gpt-5.5 | 34.90 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 40 | openai/gpt-5.4-mini | 34.47 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 41 | minimax/minimax-m2.7 | 33.29 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 42 | google/gemini-3.1-flash-lite-preview | 31.82 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-06 |
| 43 | xiaomi/mimo-v2.5 | 31.55 | MiMo-V2.5 xiaomi-mimo-v2.5 | Imported | 2026-05-06 |
| 44 | z-ai/glm-5.1 | 30.29 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 45 | xiaomi/mimo-v2-pro | 29.99 | MiMo-V2-Pro xiaomi-mimo-v2-pro | Imported | 2026-05-06 |
| 46 | openai/gpt-5.4-nano | 29.90 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 47 | nvidia/nemotron-3-super-120b-a12b | 29.30 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-06 |
| 48 | anthropic/claude-opus-4.7 | 28.81 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 49 | x-ai/grok-4.20 | 28.75 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 50 | anthropic/claude-opus-4.7 | 28.63 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 51 | anthropic/claude-opus-4.7 | 28.54 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 52 | anthropic/claude-opus-4.7 | 27.96 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 53 | google/gemini-3.1-flash-lite-preview | 26.94 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-06 |
| 54 | openai/gpt-5.4 | 26.69 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 55 | deepseek/deepseek-v4-pro | 26.31 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 56 | google/gemini-3-flash-preview | 25.60 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 57 | mistralai/mistral-small-4 | 25.35 | — | Imported | 2026-05-06 |
| 58 | xiaomi/mimo-v2.5-pro | 24.25 | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-06 |
| 59 | xiaomi/mimo-v2.5 | 23.99 | MiMo-V2.5 xiaomi-mimo-v2.5 | Imported | 2026-05-06 |
| 60 | deepseek/deepseek-v4-flash | 23.47 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 61 | xiaomi/mimo-v2-pro | 22.79 | MiMo-V2-Pro xiaomi-mimo-v2-pro | Imported | 2026-05-06 |
| 62 | openai/gpt-5.4-nano | 21.87 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 63 | openai/gpt-5.4-mini | 20.83 | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-06 |
| 64 | xiaomi/mimo-v2-omni | 20.56 | MiMo-V2-Omni xiaomi-mimo-v2-omni | Imported | 2026-05-06 |
| 65 | google/gemini-3.1-flash-lite-preview | 18.23 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-06 |
| 66 | anthropic/claude-haiku-4.5 | 17.68 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 67 | anthropic/claude-opus-4.7 | 15.12 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 68 | x-ai/grok-4.20 | 14.49 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 69 | mistralai/mistral-small-4 | 13.96 | — | Imported | 2026-05-06 |
| 70 | openai/gpt-5.4-nano | 12.31 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
No matching rows.