Factorio Learning Environment
Interactive Factorio automation benchmark for LLM agents, tracking production score, milestones, automation milestones, and lab-task success rate.
6rows
production_scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Production Score, Milestones, Automation Milestones, Lab Tasks Success Rate
| Rank | Subject | Production Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 3.5-Sonnet | 293206 production score | — | Imported | 2026-05-28 |
| 2 | Gemini-2-Flash | 115782 production score | — | Imported | 2026-05-28 |
| 3 | GPT4o | 87599 production score | — | Imported | 2026-05-28 |
| 4 | Llama-3.3-70b | 54998 production score | — | Imported | 2026-05-28 |
| 5 | Deepseek-v3 | 48585 production score | — | Imported | 2026-05-28 |
| 6 | GPT4o-Mini | 26756 production score | — | Imported | 2026-05-28 |
No matching rows.