SWE Atlas - Refactoring
SWE Atlas Refactoring evaluates coding agents on restructuring code while preserving behavior across real-world software repositories.
11rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Opus-4.7 (Claude Code) | 48.57 | — | Imported | 2026-05-06 |
| 1 | Gpt-5.5 (Codex) | 44.79 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 1 | Gpt-5.4 (Codex) | 44.29 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 1 | Gpt-5.3 (Codex) | 42.38 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 1 | Opus-4.6 (Claude Code) | 35.58 | — | Imported | 2026-05-06 |
| 6 | Gemini-3.1-Pro (Gemini CLI) | 33.81 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 6 | Sonnet-4.6 (Claude Code) | 32.21 | — | Imported | 2026-05-06 |
| 8 | Glm-5 (Mini-SWE-Agent) | 24.24 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 9 | Kimi-K2.5 (Mini-SWE-Agent) | 20.95 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 10 | Minimax-M2.5 (Mini-SWE-Agent) | 19.52 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 11 | Gemini-3-Flash (Mini-SWE-Agent) | 10 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
No matching rows.