vexp-swe-bench
Open coding-agent benchmark harness comparing agent resolution rate, cost, and unique wins on a curated 100-task subset of SWE-bench Verified.
4rows
pass_at_1primary metric
2026-05-06sampled
Metadata
Metrics
Pass@1, Cost per Task (lower is better), Unique Wins Lower Bound
| Rank | Subject | Pass@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | vexp + Claude Code | 73 | — | Imported | 2026-05-06 |
| 2 | Live-SWE-Agent | 72 | — | Imported | 2026-05-06 |
| 3 | OpenHands | 70 | — | Imported | 2026-05-06 |
| 4 | Sonar Foundation | 70 | — | Imported | 2026-05-06 |
No matching rows.