SWE Atlas - Refactoring

SWE Atlas Refactoring evaluates coding agents on restructuring code while preserving behavior across real-world software repositories.

11rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Opus-4.7 (Claude Code) 48.57 Imported 2026-05-06
1 Gpt-5.5 (Codex) 44.79 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
1 Gpt-5.4 (Codex) 44.29 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
1 Gpt-5.3 (Codex) 42.38 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
1 Opus-4.6 (Claude Code) 35.58 Imported 2026-05-06
6 Gemini-3.1-Pro (Gemini CLI) 33.81 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
6 Sonnet-4.6 (Claude Code) 32.21 Imported 2026-05-06
8 Glm-5 (Mini-SWE-Agent) 24.24 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
9 Kimi-K2.5 (Mini-SWE-Agent) 20.95 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
10 Minimax-M2.5 (Mini-SWE-Agent) 19.52 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-06
11 Gemini-3-Flash (Mini-SWE-Agent) 10 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06