SWE-bench Live
Microsoft live SWE-bench-style benchmark for real-world issue resolution, updated with recent GitHub tasks and frozen lite/verified splits for evaluation.
91rows
resolved_percentageprimary metric
2026-05-27sampled
Metadata
Metrics
Resolved percentage, Resolved count, Total tasks
| Rank | Subject | Resolved percentage | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | OpenHands + Claude-4.5-Sonnet (ccpp) | 43.75% | — | Imported | 2026-05-27 |
| 2 | SWE-agent + Claude-4.5-Sonnet (ccpp) | 43.75% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + GPT5.2-Thinking (Medium) (ccpp) | 41.666667% | — | Imported | 2026-05-27 |
| 4 | Claude Code + GPT5.2-Thinking (Medium) (ccpp) | 37.5% | — | Imported | 2026-05-27 |
| 5 | OpenHands + GPT5.2-Thinking (Medium) (ccpp) | 37.5% | — | Imported | 2026-05-27 |
| 6 | Claude Code + Claude-4.5-Sonnet (ccpp) | 35.416667% | — | Imported | 2026-05-27 |
| 7 | OpenHands + Gemini3-Flash (ccpp) | 35.416667% | — | Imported | 2026-05-27 |
| 8 | SWE-agent + Gemini3-Flash (ccpp) | 35.416667% | — | Imported | 2026-05-27 |
| 9 | SWE-agent + DeepSeek-V3.1-Terminus (ccpp) | 27.083333% | — | Imported | 2026-05-27 |
| 10 | OpenHands + DeepSeek-V3.1-Terminus (ccpp) | 25.0% | — | Imported | 2026-05-27 |
| 11 | Claude Code + DeepSeek-V3.1-Terminus (ccpp) | 10.416667% | — | Imported | 2026-05-27 |
| 1 | OpenHands + GPT5.2-Thinking (Medium) (csharp) | 21.428571% | — | Imported | 2026-05-27 |
| 2 | SWE-agent + Claude-4.5-Sonnet (csharp) | 21.428571% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + DeepSeek-V3.1-Terminus (csharp) | 21.428571% | — | Imported | 2026-05-27 |
| 4 | SWE-agent + GPT5.2-Thinking (Medium) (csharp) | 21.428571% | — | Imported | 2026-05-27 |
| 5 | OpenHands + Claude-4.5-Sonnet (csharp) | 17.857143% | — | Imported | 2026-05-27 |
| 6 | OpenHands + DeepSeek-V3.1-Terminus (csharp) | 17.857143% | — | Imported | 2026-05-27 |
| 7 | OpenHands + Gemini3-Flash (csharp) | 17.857143% | — | Imported | 2026-05-27 |
| 8 | Claude Code + Claude-4.5-Sonnet (csharp) | 14.285714% | — | Imported | 2026-05-27 |
| 9 | Claude Code + DeepSeek-V3.1-Terminus (csharp) | 14.285714% | — | Imported | 2026-05-27 |
| 10 | Claude Code + GPT5.2-Thinking (Medium) (csharp) | 14.285714% | — | Imported | 2026-05-27 |
| 11 | SWE-agent + Gemini3-Flash (csharp) | 14.285714% | — | Imported | 2026-05-27 |
| 1 | OpenHands + Claude 3.7 Sonnet (full) | 19.257013% | — | Imported | 2026-05-27 |
| 2 | SWE-agent + GPT 4.1 (full) | 18.574678% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + Claude 3.7 Sonnet (full) | 17.134193% | — | Imported | 2026-05-27 |
| 4 | SWE-agent + GPT 4o (full) | 9.097801% | — | Imported | 2026-05-27 |
| 1 | Claude Code + Claude-4.5-Sonnet (go) | 44.117647% | — | Imported | 2026-05-27 |
| 2 | OpenHands + Claude-4.5-Sonnet (go) | 39.705882% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + Claude-4.5-Sonnet (go) | 39.705882% | — | Imported | 2026-05-27 |
| 4 | SWE-agent + Gemini3-Flash (go) | 32.352941% | — | Imported | 2026-05-27 |
| 5 | OpenHands + Gemini3-Flash (go) | 30.882353% | — | Imported | 2026-05-27 |
| 6 | Claude Code + DeepSeek-V3.1-Terminus (go) | 29.411765% | — | Imported | 2026-05-27 |
| 7 | Claude Code + GPT5.2-Thinking (Medium) (go) | 27.941176% | — | Imported | 2026-05-27 |
| 8 | SWE-agent + GPT5.2-Thinking (Medium) (go) | 25.0% | — | Imported | 2026-05-27 |
| 9 | OpenHands + DeepSeek-V3.1-Terminus (go) | 22.058824% | — | Imported | 2026-05-27 |
| 10 | OpenHands + GPT5.2-Thinking (Medium) (go) | 20.588235% | — | Imported | 2026-05-27 |
| 11 | SWE-agent + DeepSeek-V3.1-Terminus (go) | 20.588235% | — | Imported | 2026-05-27 |
| 1 | Claude Code + GPT5.2-Thinking (Medium) (java) | 37.096774% | — | Imported | 2026-05-27 |
| 2 | Claude Code + Claude-4.5-Sonnet (java) | 27.419355% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + DeepSeek-V3.1-Terminus (java) | 27.419355% | — | Imported | 2026-05-27 |
| 4 | SWE-agent + Gemini3-Flash (java) | 25.806452% | — | Imported | 2026-05-27 |
| 5 | OpenHands + DeepSeek-V3.1-Terminus (java) | 24.193548% | — | Imported | 2026-05-27 |
| 6 | OpenHands + Gemini3-Flash (java) | 24.193548% | — | Imported | 2026-05-27 |
| 7 | Claude Code + DeepSeek-V3.1-Terminus (java) | 22.580645% | — | Imported | 2026-05-27 |
| 8 | OpenHands + GPT5.2-Thinking (Medium) (java) | 22.580645% | — | Imported | 2026-05-27 |
| 9 | SWE-agent + GPT5.2-Thinking (Medium) (java) | 22.580645% | — | Imported | 2026-05-27 |
| 10 | OpenHands + Claude-4.5-Sonnet (java) | 19.354839% | — | Imported | 2026-05-27 |
| 11 | SWE-agent + Claude-4.5-Sonnet (java) | 16.129032% | — | Imported | 2026-05-27 |
| 1 | SWE-agent + Claude-4.5-Sonnet (lite) | 36.0% | — | Imported | 2026-05-27 |
| 2 | OpenHands + Qwen3-Coder-480B-A35B (lite) | 24.666667% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + GPT5-Thinking (Medium) (lite) | 24.333333% | — | Imported | 2026-05-27 |
| 4 | OpenHands + Claude 3.7 Sonnet (lite) | 17.666667% | — | Imported | 2026-05-27 |
| 5 | SWE-agent + Claude 3.7 Sonnet (lite) | 17.666667% | — | Imported | 2026-05-27 |
| 6 | SWE-agent + GPT 4.1 (lite) | 16.333333% | — | Imported | 2026-05-27 |
| 7 | SWE-agent + DeepSeek V3 (lite) | 15.333333% | — | Imported | 2026-05-27 |
| 8 | Agentless + DeepSeek V3 (lite) | 13.333333% | — | Imported | 2026-05-27 |
| 9 | OpenHands + DeepSeek V3 (lite) | 13.0% | — | Imported | 2026-05-27 |
| 10 | Agentless + GPT 4.1 (lite) | 12.0% | — | Imported | 2026-05-27 |
| 11 | Agentless + GPT 4o (lite) | 11.666667% | — | Imported | 2026-05-27 |
| 12 | Agentless + Claude 3.7 Sonnet (lite) | 11.333333% | — | Imported | 2026-05-27 |
| 13 | OpenHands + GPT 4.1 (lite) | 11.333333% | — | Imported | 2026-05-27 |
| 14 | SWE-agent + GPT 4o (lite) | 10.0% | — | Imported | 2026-05-27 |
| 15 | OpenHands + GPT 4o (lite) | 7.0% | — | Imported | 2026-05-27 |
| 1 | SWE-agent + Gemini3-Flash (rust) | 37.777778% | — | Imported | 2026-05-27 |
| 2 | OpenHands + Gemini3-Flash (rust) | 35.555556% | — | Imported | 2026-05-27 |
| 3 | Claude Code + GPT5.2-Thinking (Medium) (rust) | 33.333333% | — | Imported | 2026-05-27 |
| 4 | SWE-agent + Claude-4.5-Sonnet (rust) | 28.888889% | — | Imported | 2026-05-27 |
| 5 | SWE-agent + DeepSeek-V3.1-Terminus (rust) | 28.888889% | — | Imported | 2026-05-27 |
| 6 | SWE-agent + GPT5.2-Thinking (Medium) (rust) | 28.888889% | — | Imported | 2026-05-27 |
| 7 | OpenHands + Claude-4.5-Sonnet (rust) | 26.666667% | — | Imported | 2026-05-27 |
| 8 | OpenHands + DeepSeek-V3.1-Terminus (rust) | 26.666667% | — | Imported | 2026-05-27 |
| 9 | OpenHands + GPT5.2-Thinking (Medium) (rust) | 26.666667% | — | Imported | 2026-05-27 |
| 10 | Claude Code + Claude-4.5-Sonnet (rust) | 22.222222% | — | Imported | 2026-05-27 |
| 11 | Claude Code + DeepSeek-V3.1-Terminus (rust) | 17.777778% | — | Imported | 2026-05-27 |
| 1 | Claude Code + Claude-4.5-Sonnet (tsjs) | 27.160494% | — | Imported | 2026-05-27 |
| 2 | Claude Code + GPT5.2-Thinking (Medium) (tsjs) | 20.37037% | — | Imported | 2026-05-27 |
| 3 | SWE-agent + Claude-4.5-Sonnet (tsjs) | 20.37037% | — | Imported | 2026-05-27 |
| 4 | OpenHands + Claude-4.5-Sonnet (tsjs) | 19.135802% | — | Imported | 2026-05-27 |
| 5 | SWE-agent + GPT5.2-Thinking (Medium) (tsjs) | 16.049383% | — | Imported | 2026-05-27 |
| 6 | Claude Code + DeepSeek-V3.1-Terminus (tsjs) | 15.432099% | — | Imported | 2026-05-27 |
| 7 | OpenHands + GPT5.2-Thinking (Medium) (tsjs) | 12.962963% | — | Imported | 2026-05-27 |
| 8 | OpenHands + DeepSeek-V3.1-Terminus (tsjs) | 11.111111% | — | Imported | 2026-05-27 |
| 9 | SWE-agent + DeepSeek-V3.1-Terminus (tsjs) | 10.493827% | — | Imported | 2026-05-27 |
| 10 | SWE-agent + Gemini3-Flash (tsjs) | 9.876543% | — | Imported | 2026-05-27 |
| 11 | OpenHands + Gemini3-Flash (tsjs) | 8.641975% | — | Imported | 2026-05-27 |
| 1 | SWE-agent + Claude-4.5-Sonnet (verified) | 40.0% | — | Imported | 2026-05-27 |
| 2 | SWE-agent + GPT5-Thinking (Medium) (verified) | 30.4% | — | Imported | 2026-05-27 |
| 1 | Win-agent + Claude-4.5-Sonnet (windows) | 30.0% | — | Imported | 2026-05-27 |
| 2 | Win-agent + DeepSeek-V3.1-Terminus (windows) | 20.0% | — | Imported | 2026-05-27 |
| 3 | Win-agent + GPT-5.2-Thinking-Medium (windows) | 20.0% | — | Imported | 2026-05-27 |
| 4 | Win-agent + Gemini-3-Flash (windows) | 16.0% | — | Imported | 2026-05-27 |
No matching rows.