RULER
RULER v1 is a synthetic long-context benchmark for measuring how model quality degrades as input length increases. This packaging follows the public standalone NVIDIA RULER implementation with 13 official tasks spanning retrieval, multi-hop tracing, aggregation, and QA.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Nemotron 3 Super (120B A12B) | 0.92 | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Self-reported | 2026-05-06 |
| 2 | Phi-3.5-MoE-instruct | 0.87 | — | Self-reported | 2026-05-06 |
| 3 | Phi-3.5-mini-instruct | 0.84 | — | Self-reported | 2026-05-06 |
No matching rows.