NanoGPT-Bench
Intology benchmark for autonomous coding and ML research agents on the NanoGPT Speedrun, measuring how much historical human speedrun progress agents can recover from a strong human starting point under a fixed H100-hour compute budget.
3rows
human_progress_recovered_percentprimary metric
2026-05-20sampled
Metadata
Metrics
Human Progress Recovered
| Rank | Subject | Human Progress Recovered | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Autoresearch (Opus 4.6 Max) | 9.3% | — | Imported | 2026-05-20 |
| 2 | Codex (GPT-5.4 xhigh) | 8.6% | — | Imported | 2026-05-20 |
| 3 | Claude Code (Opus 4.6 Max) | 8.2% | — | Imported | 2026-05-20 |
No matching rows.