NeedleBench
NeedleBench: Measures long-context retrieval, needle finding, summarization, factual grounding, or retrieval-augmented generation quality.
12rows
overall_128k_scoreprimary metric
2026-05-27sampled
Metadata
Metrics
Overall 128K score, Single-retrieval overall, Multi-retrieval overall, Multi-reasoning overall
| Rank | Subject | Overall 128K score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Qwen-2.5-72B | 81.02% | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-27 |
| 2 | Gemma-3-27B | 80.38% | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-27 |
| 3 | Qwen-2.5-32B | 78.25% | — | Imported | 2026-05-27 |
| 4 | InternLM3-8B | 75.49% | — | Imported | 2026-05-27 |
| 5 | Gemma-3-12B | 75.31% | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-27 |
| 6 | Qwen-2.5-14B | 73.96% | — | Imported | 2026-05-27 |
| 7 | LLaMA-3.1-70B | 72.37% | — | Imported | 2026-05-27 |
| 8 | LLaMA-3.1-8B | 70.98% | — | Imported | 2026-05-27 |
| 9 | Qwen-2.5-7B | 70.75% | — | Imported | 2026-05-27 |
| 10 | InternLM2.5-7B-Chat-1M | 69.17% | — | Imported | 2026-05-27 |
| 11 | GLM-4-9B-Chat | 66.51% | — | Imported | 2026-05-27 |
| 12 | Gemma-3-4B | 64.42% | Gemma 3 4B google-gemma-3-4b-it | Imported | 2026-05-27 |
No matching rows.