SWT-Bench
SWT-Bench: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.
33rows
success_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Success Rate, Coverage Increase
| Rank | Subject | Success Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | DevstralTestGen Devstral 2 | 89.1% | — | Imported | 2026-05-27 |
| 2 | TEX-T Claude 4 Sonnet | 87% | — | Imported | 2026-05-27 |
| 3 | LogicStar AI L*Agent v1 | 84% | — | Imported | 2026-05-27 |
| 4 | OpenHands GPT-5 | 79.8% | GPT-5 openai-gpt-5 | Imported | 2026-05-27 |
| 5 | ReProAgent GPT-5-mini | 69.7% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-27 |
| 6 | OpenHands GPT-5-mini | 62.4% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-27 |
| 7 | e-Otter++ Claude 3.7 Sonnet | 62.1% | — | Imported | 2026-05-27 |
| 8 | ReProAgent GPT-5-mini | 56.2% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-27 |
| 9 | e-Otter++ Claude 3.7 Sonnet | 52.5% | — | Imported | 2026-05-27 |
| 10 | Amazon Q Developer Agent v20250405-dev | 51% | — | Imported | 2026-05-27 |
| 11 | AEGIS | 47.8% | — | Imported | 2026-05-27 |
| 12 | AssertFlip GPT-4o | 45.5% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 13 | Amazon Q Developer Agent v20250405-dev | 39.9% | — | Imported | 2026-05-27 |
| 14 | AssertFlip GPT-4o | 38% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 15 | Otter++ GPT-4o | 37.4% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 16 | Otter GPT-4o | 31.6% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 17 | OpenHands Cl. Sonnet 3.5, CI setup | 28.3% | — | Imported | 2026-05-27 |
| 18 | OpenHands Cl. Sonnet 3.5 | 27.7% | — | Imported | 2026-05-27 |
| 19 | OpenHands Cl. Sonnet 3.5, vanilla | 22.8% | — | Imported | 2026-05-27 |
| 20 | SWE-Agent+ GPT-4 | 18.5% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 21 | LIBRO GPT-4o | 17.8% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 22 | SWE-Agent Mistral Large 2 | 16.3% | — | Imported | 2026-05-27 |
| 23 | SWE-Agent GPT-4 | 15.9% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 24 | Zero-Shot Plus GPT-4o + BM25 | 14.3% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 25 | LIBRO GPT-4 | 14.1% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 26 | Aider GPT-4 | 12.7% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 27 | SWE-Agent Cl. 3.5 Sonnet | 12.3% | — | Imported | 2026-05-27 |
| 28 | SWE-Agent GPT-4o mini | 9.8% | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-27 |
| 29 | Zero-Shot Plus GPT-4 + BM25 | 9.4% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 30 | AutoCodeRover GPT-4 | 9.1% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 31 | Zero-Shot Base GPT-4 + BM25 | 3.6% | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 32 | SWE-Agent Claude 3 Haiku | 2.5% | — | Imported | 2026-05-27 |
| 33 | SWE-Agent Mixtral 8x22B | 0.7% | — | Imported | 2026-05-27 |
No matching rows.