SWE-bench Pro
Scale AI's professional software engineering benchmark extending SWE-bench-style issue resolution tasks.
32rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score
Showing 5 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 69.2% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 64.3% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 3 | GPT-5.5 | 58.6% | GPT-5.5 openai-gpt-5.5 | Self-reported | 2026-05-28 |
| 4 | Gemini 3.1 Pro Preview | 54.2% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-28 |
| 1 | Qwen3.7 Max | 60.6% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | Kimi K2.6 Thinking | 59.5% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 3 | DeepSeek V4 Pro Max | 59% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 4 | GLM-5.1 Thinking | 58.8% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 5 | Claude Opus 4.6 Max | 57.3% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 6 | Qwen3.6 Plus | 56.6% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 1 | MiniMaxAI/MiniMax-M2.5 | 55.40 | — | Imported | 2026-05-06 |
| 2 | moonshotai/Kimi-K2.5 | 50.70 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 3 | Qwen/Qwen3-Coder-Next | 44.30 | Qwen3 Coder Next qwen-qwen3-coder-next | Imported | 2026-05-06 |
| 4 | Qwen/Qwen3-Coder-480B-A35B-Instruct | 38.70 | Qwen3 Coder 480B A35B qwen-qwen3-coder | Imported | 2026-05-06 |
| 5 | MiniMaxAI/MiniMax-M2.1 | 36.81 | — | Imported | 2026-05-06 |
| 6 | moonshotai/Kimi-K2-Instruct | 27.67 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-06 |
| 7 | Qwen/Qwen3-235B-A22B | 21.41 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 8 | openai/gpt-oss-120b | 16.20 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 9 | deepseek-ai/DeepSeek-V3.2 | 15.56 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 10 | google/gemma-3-27b-it | 11.38 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-06 |
| 11 | meta-llama/Llama-3.1-405B-Instruct | 11.18 | — | Imported | 2026-05-06 |
| 12 | zai-org/GLM-4.6 | 9.67 | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-06 |
| 13 | meta-llama/Llama-4-Maverick-17B-128E-Instruct | 5.24 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 1 | Claude Opus 4.7 | 64.3% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
| 2 | GPT-5.5 | 58.6% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 3 | GPT-5.4 | 57.7% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 4 | Gemini 3.1 Pro Preview | 54.2% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
| 1 | Claude Mythos Preview | 77.8% | Claude Mythos Preview anthropic-claude-mythos-preview | Launch post | 2026-04-16 |
| 2 | Claude Opus 4.7 | 64.3% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-16 |
| 3 | GPT-5.4 | 57.7% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-16 |
| 4 | Gemini 3.1 Pro Preview | 54.2% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-16 |
| 5 | Claude Opus 4.6 | 53.4% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Launch post | 2026-04-16 |
No matching rows.