SWE-bench Live

Microsoft live SWE-bench-style benchmark for real-world issue resolution, updated with recent GitHub tasks and frozen lite/verified splits for evaluation.

91rows
resolved_percentageprimary metric
2026-05-27sampled

Metadata

Metrics

Resolved percentage, Resolved count, Total tasks

Latest Results

Rows are parsed from the official SWE-bench-Live GitHub Pages JSONL report feed used by the leaderboard. Ranks are assigned within each split by resolved percentage.

Rank Subject Resolved percentage Model Match Provenance Sampled
1 OpenHands + Claude-4.5-Sonnet (ccpp) 43.75% Imported 2026-05-27
2 SWE-agent + Claude-4.5-Sonnet (ccpp) 43.75% Imported 2026-05-27
3 SWE-agent + GPT5.2-Thinking (Medium) (ccpp) 41.666667% Imported 2026-05-27
4 Claude Code + GPT5.2-Thinking (Medium) (ccpp) 37.5% Imported 2026-05-27
5 OpenHands + GPT5.2-Thinking (Medium) (ccpp) 37.5% Imported 2026-05-27
6 Claude Code + Claude-4.5-Sonnet (ccpp) 35.416667% Imported 2026-05-27
7 OpenHands + Gemini3-Flash (ccpp) 35.416667% Imported 2026-05-27
8 SWE-agent + Gemini3-Flash (ccpp) 35.416667% Imported 2026-05-27
9 SWE-agent + DeepSeek-V3.1-Terminus (ccpp) 27.083333% Imported 2026-05-27
10 OpenHands + DeepSeek-V3.1-Terminus (ccpp) 25.0% Imported 2026-05-27
11 Claude Code + DeepSeek-V3.1-Terminus (ccpp) 10.416667% Imported 2026-05-27
1 OpenHands + GPT5.2-Thinking (Medium) (csharp) 21.428571% Imported 2026-05-27
2 SWE-agent + Claude-4.5-Sonnet (csharp) 21.428571% Imported 2026-05-27
3 SWE-agent + DeepSeek-V3.1-Terminus (csharp) 21.428571% Imported 2026-05-27
4 SWE-agent + GPT5.2-Thinking (Medium) (csharp) 21.428571% Imported 2026-05-27
5 OpenHands + Claude-4.5-Sonnet (csharp) 17.857143% Imported 2026-05-27
6 OpenHands + DeepSeek-V3.1-Terminus (csharp) 17.857143% Imported 2026-05-27
7 OpenHands + Gemini3-Flash (csharp) 17.857143% Imported 2026-05-27
8 Claude Code + Claude-4.5-Sonnet (csharp) 14.285714% Imported 2026-05-27
9 Claude Code + DeepSeek-V3.1-Terminus (csharp) 14.285714% Imported 2026-05-27
10 Claude Code + GPT5.2-Thinking (Medium) (csharp) 14.285714% Imported 2026-05-27
11 SWE-agent + Gemini3-Flash (csharp) 14.285714% Imported 2026-05-27
1 OpenHands + Claude 3.7 Sonnet (full) 19.257013% Imported 2026-05-27
2 SWE-agent + GPT 4.1 (full) 18.574678% Imported 2026-05-27
3 SWE-agent + Claude 3.7 Sonnet (full) 17.134193% Imported 2026-05-27
4 SWE-agent + GPT 4o (full) 9.097801% Imported 2026-05-27
1 Claude Code + Claude-4.5-Sonnet (go) 44.117647% Imported 2026-05-27
2 OpenHands + Claude-4.5-Sonnet (go) 39.705882% Imported 2026-05-27
3 SWE-agent + Claude-4.5-Sonnet (go) 39.705882% Imported 2026-05-27
4 SWE-agent + Gemini3-Flash (go) 32.352941% Imported 2026-05-27
5 OpenHands + Gemini3-Flash (go) 30.882353% Imported 2026-05-27
6 Claude Code + DeepSeek-V3.1-Terminus (go) 29.411765% Imported 2026-05-27
7 Claude Code + GPT5.2-Thinking (Medium) (go) 27.941176% Imported 2026-05-27
8 SWE-agent + GPT5.2-Thinking (Medium) (go) 25.0% Imported 2026-05-27
9 OpenHands + DeepSeek-V3.1-Terminus (go) 22.058824% Imported 2026-05-27
10 OpenHands + GPT5.2-Thinking (Medium) (go) 20.588235% Imported 2026-05-27
11 SWE-agent + DeepSeek-V3.1-Terminus (go) 20.588235% Imported 2026-05-27
1 Claude Code + GPT5.2-Thinking (Medium) (java) 37.096774% Imported 2026-05-27
2 Claude Code + Claude-4.5-Sonnet (java) 27.419355% Imported 2026-05-27
3 SWE-agent + DeepSeek-V3.1-Terminus (java) 27.419355% Imported 2026-05-27
4 SWE-agent + Gemini3-Flash (java) 25.806452% Imported 2026-05-27
5 OpenHands + DeepSeek-V3.1-Terminus (java) 24.193548% Imported 2026-05-27
6 OpenHands + Gemini3-Flash (java) 24.193548% Imported 2026-05-27
7 Claude Code + DeepSeek-V3.1-Terminus (java) 22.580645% Imported 2026-05-27
8 OpenHands + GPT5.2-Thinking (Medium) (java) 22.580645% Imported 2026-05-27
9 SWE-agent + GPT5.2-Thinking (Medium) (java) 22.580645% Imported 2026-05-27
10 OpenHands + Claude-4.5-Sonnet (java) 19.354839% Imported 2026-05-27
11 SWE-agent + Claude-4.5-Sonnet (java) 16.129032% Imported 2026-05-27
1 SWE-agent + Claude-4.5-Sonnet (lite) 36.0% Imported 2026-05-27
2 OpenHands + Qwen3-Coder-480B-A35B (lite) 24.666667% Imported 2026-05-27
3 SWE-agent + GPT5-Thinking (Medium) (lite) 24.333333% Imported 2026-05-27
4 OpenHands + Claude 3.7 Sonnet (lite) 17.666667% Imported 2026-05-27
5 SWE-agent + Claude 3.7 Sonnet (lite) 17.666667% Imported 2026-05-27
6 SWE-agent + GPT 4.1 (lite) 16.333333% Imported 2026-05-27
7 SWE-agent + DeepSeek V3 (lite) 15.333333% Imported 2026-05-27
8 Agentless + DeepSeek V3 (lite) 13.333333% Imported 2026-05-27
9 OpenHands + DeepSeek V3 (lite) 13.0% Imported 2026-05-27
10 Agentless + GPT 4.1 (lite) 12.0% Imported 2026-05-27
11 Agentless + GPT 4o (lite) 11.666667% Imported 2026-05-27
12 Agentless + Claude 3.7 Sonnet (lite) 11.333333% Imported 2026-05-27
13 OpenHands + GPT 4.1 (lite) 11.333333% Imported 2026-05-27
14 SWE-agent + GPT 4o (lite) 10.0% Imported 2026-05-27
15 OpenHands + GPT 4o (lite) 7.0% Imported 2026-05-27
1 SWE-agent + Gemini3-Flash (rust) 37.777778% Imported 2026-05-27
2 OpenHands + Gemini3-Flash (rust) 35.555556% Imported 2026-05-27
3 Claude Code + GPT5.2-Thinking (Medium) (rust) 33.333333% Imported 2026-05-27
4 SWE-agent + Claude-4.5-Sonnet (rust) 28.888889% Imported 2026-05-27
5 SWE-agent + DeepSeek-V3.1-Terminus (rust) 28.888889% Imported 2026-05-27
6 SWE-agent + GPT5.2-Thinking (Medium) (rust) 28.888889% Imported 2026-05-27
7 OpenHands + Claude-4.5-Sonnet (rust) 26.666667% Imported 2026-05-27
8 OpenHands + DeepSeek-V3.1-Terminus (rust) 26.666667% Imported 2026-05-27
9 OpenHands + GPT5.2-Thinking (Medium) (rust) 26.666667% Imported 2026-05-27
10 Claude Code + Claude-4.5-Sonnet (rust) 22.222222% Imported 2026-05-27
11 Claude Code + DeepSeek-V3.1-Terminus (rust) 17.777778% Imported 2026-05-27
1 Claude Code + Claude-4.5-Sonnet (tsjs) 27.160494% Imported 2026-05-27
2 Claude Code + GPT5.2-Thinking (Medium) (tsjs) 20.37037% Imported 2026-05-27
3 SWE-agent + Claude-4.5-Sonnet (tsjs) 20.37037% Imported 2026-05-27
4 OpenHands + Claude-4.5-Sonnet (tsjs) 19.135802% Imported 2026-05-27
5 SWE-agent + GPT5.2-Thinking (Medium) (tsjs) 16.049383% Imported 2026-05-27
6 Claude Code + DeepSeek-V3.1-Terminus (tsjs) 15.432099% Imported 2026-05-27
7 OpenHands + GPT5.2-Thinking (Medium) (tsjs) 12.962963% Imported 2026-05-27
8 OpenHands + DeepSeek-V3.1-Terminus (tsjs) 11.111111% Imported 2026-05-27
9 SWE-agent + DeepSeek-V3.1-Terminus (tsjs) 10.493827% Imported 2026-05-27
10 SWE-agent + Gemini3-Flash (tsjs) 9.876543% Imported 2026-05-27
11 OpenHands + Gemini3-Flash (tsjs) 8.641975% Imported 2026-05-27
1 SWE-agent + Claude-4.5-Sonnet (verified) 40.0% Imported 2026-05-27
2 SWE-agent + GPT5-Thinking (Medium) (verified) 30.4% Imported 2026-05-27
1 Win-agent + Claude-4.5-Sonnet (windows) 30.0% Imported 2026-05-27
2 Win-agent + DeepSeek-V3.1-Terminus (windows) 20.0% Imported 2026-05-27
3 Win-agent + GPT-5.2-Thinking-Medium (windows) 20.0% Imported 2026-05-27
4 Win-agent + Gemini-3-Flash (windows) 16.0% Imported 2026-05-27