SWE-bench Pro

Scale AI's professional software engineering benchmark extending SWE-bench-style issue resolution tasks.

32rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score

Showing 5 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Score Model Match Provenance Sampled
1 Claude Opus 4.8 69.2% Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
2 Claude Opus 4.7 64.3% Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
3 GPT-5.5 58.6% GPT-5.5
openai-gpt-5.5
Self-reported 2026-05-28
4 Gemini 3.1 Pro Preview 54.2% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-28
1 Qwen3.7 Max 60.6% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 Kimi K2.6 Thinking 59.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
3 DeepSeek V4 Pro Max 59% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
4 GLM-5.1 Thinking 58.8% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
5 Claude Opus 4.6 Max 57.3% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
6 Qwen3.6 Plus 56.6% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
1 MiniMaxAI/MiniMax-M2.5 55.40 Imported 2026-05-06
2 moonshotai/Kimi-K2.5 50.70 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
3 Qwen/Qwen3-Coder-Next 44.30 Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-06
4 Qwen/Qwen3-Coder-480B-A35B-Instruct 38.70 Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-06
5 MiniMaxAI/MiniMax-M2.1 36.81 Imported 2026-05-06
6 moonshotai/Kimi-K2-Instruct 27.67 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-06
7 Qwen/Qwen3-235B-A22B 21.41 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
8 openai/gpt-oss-120b 16.20 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
9 deepseek-ai/DeepSeek-V3.2 15.56 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
10 google/gemma-3-27b-it 11.38 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-06
11 meta-llama/Llama-3.1-405B-Instruct 11.18 Imported 2026-05-06
12 zai-org/GLM-4.6 9.67 GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-06
13 meta-llama/Llama-4-Maverick-17B-128E-Instruct 5.24 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
1 Claude Opus 4.7 64.3% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
2 GPT-5.5 58.6% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
3 GPT-5.4 57.7% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
4 Gemini 3.1 Pro Preview 54.2% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
1 Claude Mythos Preview 77.8% Claude Mythos Preview
anthropic-claude-mythos-preview
Launch post 2026-04-16
2 Claude Opus 4.7 64.3% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-16
3 GPT-5.4 57.7% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-16
4 Gemini 3.1 Pro Preview 54.2% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-16
5 Claude Opus 4.6 53.4% Claude Opus 4.6
anthropic-claude-opus-4.6
Launch post 2026-04-16