SWE-rebench

Continuously evolving, decontaminated software engineering benchmark built from real GitHub pull requests for evaluating coding agents.

34rows
resolved_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Resolved Rate, Resolved Rate SEM (lower is better), Pass@5, Cost per Problem (lower is better), Tokens per Problem (lower is better), Cached Tokens

Latest Results

Rows ranked by the public table rank. Percentage fields are stored as percentage points.

Rank Subject Resolved Rate Model Match Provenance Sampled
1 Claude Opus 4.6 65.30 Imported 2026-05-06
2 gpt-5.2-2025-12-11-medium 64.40 Imported 2026-05-06
3 GLM-5 62.80 Imported 2026-05-06
4 Junie 62.80 Imported 2026-05-06
5 gpt-5.4-2026-03-05-medium 62.80 Imported 2026-05-06
6 GLM-5.1 62.70 Imported 2026-05-06
7 Gemini 3.1 Pro Preview 62.30 Imported 2026-05-06
8 DeepSeek-V3.2 60.90 Imported 2026-05-06
9 Claude Sonnet 4.6 60.70 Imported 2026-05-06
10 Claude Sonnet 4.5 60 Imported 2026-05-06
11 Qwen3.5-397B-A17B 59.90 Imported 2026-05-06
12 Step-3.5-Flash 59.60 Imported 2026-05-06
13 Qwen3.5-27B 58.90 Imported 2026-05-06
14 GLM-4.7 58.70 Imported 2026-05-06
15 gpt-5.3-codex-xhigh 58.60 Imported 2026-05-06
16 Kimi K2.5 58.50 Imported 2026-05-06
17 Claude Code 58.40 Imported 2026-05-06
18 Codex 58.30 Imported 2026-05-06
19 gpt-5.3-codex 58.20 Imported 2026-05-06
20 Cursor 58 Imported 2026-05-06
21 Kimi K2 Thinking 57.40 Imported 2026-05-06
22 gpt-5.2-codex 56.80 Imported 2026-05-06
23 MiniMax M2.5 54.60 Imported 2026-05-06
24 Qwen3-Coder-Next 54.40 Imported 2026-05-06
25 Qwen3.5-35B-A3B 53.70 Imported 2026-05-06
26 Gemini 3 Flash Preview 52.50 Imported 2026-05-06
27 MiniMax M2.7 51.90 Imported 2026-05-06
28 Devstral-2-123B-Instruct-2512 48.80 Imported 2026-05-06
29 Qwen3-Coder-480B-A35B-Instruct 44.70 Imported 2026-05-06
30 Gemma 4 31B 41.60 Imported 2026-05-06
31 Devstral-Small-2-24B-Instruct-2512 38.90 Imported 2026-05-06
32 GLM-4.5 Air 38.30 Imported 2026-05-06
33 GLM-4.7 Flash 34 Imported 2026-05-06
34 gpt-oss-120b 33.30 Imported 2026-05-06