SWE Atlas - Codebase QnA
SWE Atlas Codebase QnA evaluates LLMs on deep code comprehension and question answering across real-world software repositories.
12rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gpt 5.4 xHigh (Codex) | 40.80 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 1 | Gpt 5.4 xHigh (Mini-SWE-Agent) | 36.30 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 1 | Opus 4.6 (Claude Code) | 33.30 | — | Imported | 2026-05-06 |
| 1 | Gpt 5.3 (Codex) | 32.60 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 1 | Sonnet 4.6 (Claude Code) | 31.20 | — | Imported | 2026-05-06 |
| 2 | Opus 4.6 (Mini-SWE-Agent) | 30 | — | Imported | 2026-05-06 |
| 3 | Muse Spark | 24.20 | — | Imported | 2026-05-06 |
| 7 | Glm 5 (Mini-SWE-Agent) | 20.50 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 8 | Gemini 3.1 Pro (Mini-SWE-Agent) | 13.50 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 8 | Kimi K2.5 (Mini-SWE-Agent) | 13.10 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 9 | Minimax M2.5 (Mini-SWE-Agent) | 10.30 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 9 | Gemini 3 Flash (Mini-SWE-Agent) | 8.20 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
No matching rows.