SWE-bench Full

Original SWE-bench leaderboard over 2,294 real GitHub issue resolution tasks.

24rows
resolvedprimary metric
2025-12-19sampled

Metadata

Metrics

Resolved

Latest Results

Official SWE-bench Full leaderboard rows. Each entry reports percent resolved out of 2,294 real GitHub issue resolution tasks; rows are agent systems or scaffolds, not pure base-model-only scores.

Rank Subject Resolved Model Match Provenance Sampled
1 Sonar Foundation Agent + Claude 4.5 Opus 52.62% Imported 2025-12-19
2 Salesforce AI Research SAGE (bash-only) 44.25% Imported 2025-12-19
3 Atlassian Rovo Dev (2025-06-05) 41.98% Imported 2025-12-19
4 Amazon Q Developer Agent (v20250405-dev) 37.1% Imported 2025-12-19
5 SWE-agent 1.0 (Claude 3.7 Sonnet) 33.83% Imported 2025-12-19
6 Amazon Q Developer Agent (v20241202-dev) 29.99% Imported 2025-12-19
7 OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022) 29.38% Imported 2025-12-19
8 AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022) 24.89% Imported 2025-12-19
9 Honeycomb 22.06% Imported 2025-12-19
10 Amazon Q Developer Agent (v20240719-dev) 19.75% Imported 2025-12-19
11 Factory Code Droid 19.27% Imported 2025-12-19
12 AutoCodeRover (v20240620) + GPT 4o (2024-05-13) 18.83% Imported 2025-12-19
13 SWE-agent + Claude 3.5 Sonnet 18.13% Imported 2025-12-19
14 AppMap Navie + GPT 4o (2024-05-13) 14.6% Imported 2025-12-19
15 Amazon Q Developer Agent (v20240430-dev) 13.82% Imported 2025-12-19
16 SWE-agent + GPT 4 (1106) 12.47% Imported 2025-12-19
17 SWE-agent + GPT 4o (2024-05-13) 11.99% Imported 2025-12-19
18 SWE-agent + Claude 3 Opus 10.51% Imported 2025-12-19
19 RAG + Claude 3 Opus 3.79% Imported 2025-12-19
20 RAG + Claude 2 1.96% Imported 2025-12-19
21 RAG + GPT 4 (1106) 1.31% Imported 2025-12-19
22 RAG + SWE-Llama 13B 0.7% Imported 2025-12-19
23 RAG + SWE-Llama 7B 0.7% Imported 2025-12-19
24 RAG + ChatGPT 3.5 0.17% Imported 2025-12-19