SWE-rebench
Continuously evolving, decontaminated software engineering benchmark built from real GitHub pull requests for evaluating coding agents.
34rows
resolved_rateprimary metric
2026-05-06sampled
Metadata
Metrics
Resolved Rate, Resolved Rate SEM (lower is better), Pass@5, Cost per Problem (lower is better), Tokens per Problem (lower is better), Cached Tokens
| Rank | Subject | Resolved Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 | 65.30 | — | Imported | 2026-05-06 |
| 2 | gpt-5.2-2025-12-11-medium | 64.40 | — | Imported | 2026-05-06 |
| 3 | GLM-5 | 62.80 | — | Imported | 2026-05-06 |
| 4 | Junie | 62.80 | — | Imported | 2026-05-06 |
| 5 | gpt-5.4-2026-03-05-medium | 62.80 | — | Imported | 2026-05-06 |
| 6 | GLM-5.1 | 62.70 | — | Imported | 2026-05-06 |
| 7 | Gemini 3.1 Pro Preview | 62.30 | — | Imported | 2026-05-06 |
| 8 | DeepSeek-V3.2 | 60.90 | — | Imported | 2026-05-06 |
| 9 | Claude Sonnet 4.6 | 60.70 | — | Imported | 2026-05-06 |
| 10 | Claude Sonnet 4.5 | 60 | — | Imported | 2026-05-06 |
| 11 | Qwen3.5-397B-A17B | 59.90 | — | Imported | 2026-05-06 |
| 12 | Step-3.5-Flash | 59.60 | — | Imported | 2026-05-06 |
| 13 | Qwen3.5-27B | 58.90 | — | Imported | 2026-05-06 |
| 14 | GLM-4.7 | 58.70 | — | Imported | 2026-05-06 |
| 15 | gpt-5.3-codex-xhigh | 58.60 | — | Imported | 2026-05-06 |
| 16 | Kimi K2.5 | 58.50 | — | Imported | 2026-05-06 |
| 17 | Claude Code | 58.40 | — | Imported | 2026-05-06 |
| 18 | Codex | 58.30 | — | Imported | 2026-05-06 |
| 19 | gpt-5.3-codex | 58.20 | — | Imported | 2026-05-06 |
| 20 | Cursor | 58 | — | Imported | 2026-05-06 |
| 21 | Kimi K2 Thinking | 57.40 | — | Imported | 2026-05-06 |
| 22 | gpt-5.2-codex | 56.80 | — | Imported | 2026-05-06 |
| 23 | MiniMax M2.5 | 54.60 | — | Imported | 2026-05-06 |
| 24 | Qwen3-Coder-Next | 54.40 | — | Imported | 2026-05-06 |
| 25 | Qwen3.5-35B-A3B | 53.70 | — | Imported | 2026-05-06 |
| 26 | Gemini 3 Flash Preview | 52.50 | — | Imported | 2026-05-06 |
| 27 | MiniMax M2.7 | 51.90 | — | Imported | 2026-05-06 |
| 28 | Devstral-2-123B-Instruct-2512 | 48.80 | — | Imported | 2026-05-06 |
| 29 | Qwen3-Coder-480B-A35B-Instruct | 44.70 | — | Imported | 2026-05-06 |
| 30 | Gemma 4 31B | 41.60 | — | Imported | 2026-05-06 |
| 31 | Devstral-Small-2-24B-Instruct-2512 | 38.90 | — | Imported | 2026-05-06 |
| 32 | GLM-4.5 Air | 38.30 | — | Imported | 2026-05-06 |
| 33 | GLM-4.7 Flash | 34 | — | Imported | 2026-05-06 |
| 34 | gpt-oss-120b | 33.30 | — | Imported | 2026-05-06 |
No matching rows.