SWE-bench Full | BenchmarkList

Metadata

Resolved

Rank	Subject	Resolved	Model Match	Provenance	Sampled
1	Sonar Foundation Agent + Claude 4.5 Opus	52.62%	—	Imported	2025-12-19
2	Salesforce AI Research SAGE (bash-only)	44.25%	—	Imported	2025-12-19
3	Atlassian Rovo Dev (2025-06-05)	41.98%	—	Imported	2025-12-19
4	Amazon Q Developer Agent (v20250405-dev)	37.1%	—	Imported	2025-12-19
5	SWE-agent 1.0 (Claude 3.7 Sonnet)	33.83%	—	Imported	2025-12-19
6	Amazon Q Developer Agent (v20241202-dev)	29.99%	—	Imported	2025-12-19
7	OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)	29.38%	—	Imported	2025-12-19
8	AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)	24.89%	—	Imported	2025-12-19
9	Honeycomb	22.06%	—	Imported	2025-12-19
10	Amazon Q Developer Agent (v20240719-dev)	19.75%	—	Imported	2025-12-19
11	Factory Code Droid	19.27%	—	Imported	2025-12-19
12	AutoCodeRover (v20240620) + GPT 4o (2024-05-13)	18.83%	—	Imported	2025-12-19
13	SWE-agent + Claude 3.5 Sonnet	18.13%	—	Imported	2025-12-19
14	AppMap Navie + GPT 4o (2024-05-13)	14.6%	—	Imported	2025-12-19
15	Amazon Q Developer Agent (v20240430-dev)	13.82%	—	Imported	2025-12-19
16	SWE-agent + GPT 4 (1106)	12.47%	—	Imported	2025-12-19
17	SWE-agent + GPT 4o (2024-05-13)	11.99%	—	Imported	2025-12-19
18	SWE-agent + Claude 3 Opus	10.51%	—	Imported	2025-12-19
19	RAG + Claude 3 Opus	3.79%	—	Imported	2025-12-19
20	RAG + Claude 2	1.96%	—	Imported	2025-12-19
21	RAG + GPT 4 (1106)	1.31%	—	Imported	2025-12-19
22	RAG + SWE-Llama 13B	0.7%	—	Imported	2025-12-19
23	RAG + SWE-Llama 7B	0.7%	—	Imported	2025-12-19
24	RAG + ChatGPT 3.5	0.17%	—	Imported	2025-12-19