SWE-Gym
SWE-Gym: Evaluates software-engineering agents on realistic issue resolution, repository navigation, testing, or maintenance workflows.
10rows
success_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Success rate, Success trajectories, Max turns (lower is better), Sampling temperature (lower is better)
| Rank | Subject | Success rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-3-5-sonnet-20241022 on SWE-Gym Lite (t=0, max_turns=50) | 29.1% | — | Imported | 2026-05-27 |
| 2 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0.4, max_turns=30) | 9.13% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 3 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0.8, max_turns=30) | 8.7% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 4 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0, max_turns=30) | 8.26% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 5 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0, max_turns=50) | 8.26% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 6 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0.5, max_turns=30) | 7.83% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 7 | gpt-4o-2024-08-06 on SWE-Gym Full (t=1, max_turns=50) | 7.71% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 8 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0.3, max_turns=30) | 7.39% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 9 | gpt-4o-2024-08-06 on SWE-Gym Lite (t=0.2, max_turns=30) | 4.78% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 10 | gpt-4o-2024-08-06 on SWE-Gym Full (t=0, max_turns=50) | 4.55% | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
No matching rows.