SWT-Bench

SWT-Bench: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.

33rows
success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Success Rate, Coverage Increase

Latest Results

Rows are parsed from the public SWT-Bench Lite and Verified leaderboard tables. Primary score is success rate; coverage increase is preserved as a secondary metric.

Rank Subject Success Rate Model Match Provenance Sampled
1 DevstralTestGen Devstral 2 89.1% Imported 2026-05-27
2 TEX-T Claude 4 Sonnet 87% Imported 2026-05-27
3 LogicStar AI L*Agent v1 84% Imported 2026-05-27
4 OpenHands GPT-5 79.8% GPT-5
openai-gpt-5
Imported 2026-05-27
5 ReProAgent GPT-5-mini 69.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-27
6 OpenHands GPT-5-mini 62.4% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-27
7 e-Otter++ Claude 3.7 Sonnet 62.1% Imported 2026-05-27
8 ReProAgent GPT-5-mini 56.2% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-27
9 e-Otter++ Claude 3.7 Sonnet 52.5% Imported 2026-05-27
10 Amazon Q Developer Agent v20250405-dev 51% Imported 2026-05-27
11 AEGIS 47.8% Imported 2026-05-27
12 AssertFlip GPT-4o 45.5% GPT-4o
openai-gpt-4o
Imported 2026-05-27
13 Amazon Q Developer Agent v20250405-dev 39.9% Imported 2026-05-27
14 AssertFlip GPT-4o 38% GPT-4o
openai-gpt-4o
Imported 2026-05-27
15 Otter++ GPT-4o 37.4% GPT-4o
openai-gpt-4o
Imported 2026-05-27
16 Otter GPT-4o 31.6% GPT-4o
openai-gpt-4o
Imported 2026-05-27
17 OpenHands Cl. Sonnet 3.5, CI setup 28.3% Imported 2026-05-27
18 OpenHands Cl. Sonnet 3.5 27.7% Imported 2026-05-27
19 OpenHands Cl. Sonnet 3.5, vanilla 22.8% Imported 2026-05-27
20 SWE-Agent+ GPT-4 18.5% GPT-4
openai-gpt-4
Imported 2026-05-27
21 LIBRO GPT-4o 17.8% GPT-4o
openai-gpt-4o
Imported 2026-05-27
22 SWE-Agent Mistral Large 2 16.3% Imported 2026-05-27
23 SWE-Agent GPT-4 15.9% GPT-4
openai-gpt-4
Imported 2026-05-27
24 Zero-Shot Plus GPT-4o + BM25 14.3% GPT-4o
openai-gpt-4o
Imported 2026-05-27
25 LIBRO GPT-4 14.1% GPT-4
openai-gpt-4
Imported 2026-05-27
26 Aider GPT-4 12.7% GPT-4
openai-gpt-4
Imported 2026-05-27
27 SWE-Agent Cl. 3.5 Sonnet 12.3% Imported 2026-05-27
28 SWE-Agent GPT-4o mini 9.8% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-27
29 Zero-Shot Plus GPT-4 + BM25 9.4% GPT-4
openai-gpt-4
Imported 2026-05-27
30 AutoCodeRover GPT-4 9.1% GPT-4
openai-gpt-4
Imported 2026-05-27
31 Zero-Shot Base GPT-4 + BM25 3.6% GPT-4
openai-gpt-4
Imported 2026-05-27
32 SWE-Agent Claude 3 Haiku 2.5% Imported 2026-05-27
33 SWE-Agent Mixtral 8x22B 0.7% Imported 2026-05-27