SWE Atlas - Codebase QnA

SWE Atlas Codebase QnA evaluates LLMs on deep code comprehension and question answering across real-world software repositories.

12rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Gpt 5.4 xHigh (Codex) 40.80 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
1 Gpt 5.4 xHigh (Mini-SWE-Agent) 36.30 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
1 Opus 4.6 (Claude Code) 33.30 Imported 2026-05-06
1 Gpt 5.3 (Codex) 32.60 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
1 Sonnet 4.6 (Claude Code) 31.20 Imported 2026-05-06
2 Opus 4.6 (Mini-SWE-Agent) 30 Imported 2026-05-06
3 Muse Spark 24.20 Imported 2026-05-06
7 Glm 5 (Mini-SWE-Agent) 20.50 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
8 Gemini 3.1 Pro (Mini-SWE-Agent) 13.50 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
8 Kimi K2.5 (Mini-SWE-Agent) 13.10 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
9 Minimax M2.5 (Mini-SWE-Agent) 10.30 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-06
9 Gemini 3 Flash (Mini-SWE-Agent) 8.20 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06