NanoGPT-Bench

Intology benchmark for autonomous coding and ML research agents on the NanoGPT Speedrun, measuring how much historical human speedrun progress agents can recover from a strong human starting point under a fixed H100-hour compute budget.

3rows
human_progress_recovered_percentprimary metric
2026-05-20sampled

Metadata

Metrics

Human Progress Recovered

Latest Results

Initial README results for frontier coding agents with a 512 H100-hour compute budget, no internet access, and a September 3, 2025 NanoGPT Speedrun human world-record starting point. The reference human progress window runs from September 3, 2025 through January 19, 2026.

Rank Subject Human Progress Recovered Model Match Provenance Sampled
1 Autoresearch (Opus 4.6 Max) 9.3% Imported 2026-05-20
2 Codex (GPT-5.4 xhigh) 8.6% Imported 2026-05-20
3 Claude Code (Opus 4.6 Max) 8.2% Imported 2026-05-20