needle-1M-bench
Centrally scored long-context needle-retrieval benchmark on dense scientific paper text, with haystacks from 50K through 1M tokens.
11rows
overall_recallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Recall, Paper-Anchored Recall, Synthetic Codes Recall, Haystack Tokens, Max Output Tokens, Depth Points
| Rank | Subject | Overall Recall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | deepseek-v4-pro | 100 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 2 | deepseek-v4-pro | 100 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 3 | gemini-2.5-pro | 100 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 4 | qwen2.5-coder-14b-instruct-awq-int4 | 100 | — | Imported | 2026-05-06 |
| 5 | qwen3-32b-awq-int4 | 100 | — | Imported | 2026-05-06 |
| 6 | deepseek-v4-pro | 100 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 7 | deepseek-v4-pro | 94 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 8 | nemotron-3-nano-omni-30b-a3b-w4a16 | 90 | — | Imported | 2026-05-06 |
| 9 | qwen3-14b-awq-int4 | 90 | — | Imported | 2026-05-06 |
| 10 | qwen3-8b-awq-int4 | 80 | — | Imported | 2026-05-06 |
| 11 | qwen3-4b-awq-int4 | 70 | — | Imported | 2026-05-06 |
No matching rows.