SWE-bench Lite
Curated 300-instance SWE-bench subset for lower-cost evaluation of issue-resolving agents.
84rows
resolvedprimary metric
2025-09-11sampled
Metadata
Metrics
Resolved
| Rank | Subject | Resolved | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | ExpeRepair-v1.0 + Claude 4 Sonnet | 60.33% | — | Imported | 2025-09-11 |
| 2 | Refact.ai Agent | 60% | — | Imported | 2025-09-11 |
| 3 | KGCompass + Claude 4 Sonnet (20250514) | 58.33% | — | Imported | 2025-09-11 |
| 4 | SWE-agent + Claude 4 Sonnet | 56.67% | — | Imported | 2025-09-11 |
| 5 | Isoform | 55% | — | Imported | 2025-09-11 |
| 6 | SemAgent_Multi-v1.0 | 51.67% | — | Imported | 2025-09-11 |
| 7 | Isea | 51.33% | — | Imported | 2025-09-11 |
| 8 | EntroPO + R2E + Qwen3-Coder-30B-A3B-Instruct | 49.67% | — | Imported | 2025-09-11 |
| 9 | Blackbox AI Agent | 49% | — | Imported | 2025-09-11 |
| 10 | Codev | 49% | — | Imported | 2025-09-11 |
| 11 | Gru(2024-12-08) | 48.67% | — | Imported | 2025-09-11 |
| 12 | ExpeRepair-v1.0 | 48.33% | — | Imported | 2025-09-11 |
| 13 | Globant Code Fixer Agent | 48.33% | — | Imported | 2025-09-11 |
| 14 | SWE-agent + Claude 3.7 Sonnet | 48% | — | Imported | 2025-09-11 |
| 15 | devlo | 47.33% | — | Imported | 2025-09-11 |
| 16 | DARS Agent | 47% | — | Imported | 2025-09-11 |
| 17 | KGCompass + Claude 3.5 Sonnet (20241022) | 46% | — | Imported | 2025-09-11 |
| 18 | EntroPO + R2E + Qwen3-Coder-30B-A3B-Instruct | 45% | — | Imported | 2025-09-11 |
| 19 | Kodu-v1 + Claude-3.5 Sonnet (20241022) | 44.67% | — | Imported | 2025-09-11 |
| 20 | CodeFuse-CGM | 44% | — | Imported | 2025-09-11 |
| 21 | CodeStory Aide + Mixed Models | 43% | — | Imported | 2025-09-11 |
| 22 | Lingxi | 42.67% | — | Imported | 2025-09-11 |
| 23 | OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022) | 41.67% | — | Imported | 2025-09-11 |
| 24 | Codart AI | 41.67% | — | Imported | 2025-09-11 |
| 25 | PatchKitty-0.9 + Claude-3.5 Sonnet (20241022) | 41.33% | — | Imported | 2025-09-11 |
| 26 | OrcaLoca + Agentless-1.5 + Claude-3.5 Sonnet (20241022) | 41% | — | Imported | 2025-09-11 |
| 27 | Composio SWE-Kit (2024-10-30) | 41% | — | Imported | 2025-09-11 |
| 28 | Agentless-1.5 + Claude-3.5 Sonnet (20241022) | 40.67% | — | Imported | 2025-09-11 |
| 29 | OpenCSG Starship Agentic Coder + GPT 4 (0806) | 39.67% | — | Imported | 2025-09-11 |
| 30 | Bytedance MarsCode Agent | 39.33% | — | Imported | 2025-09-11 |
| 31 | Moatless Tools + Claude 3.5 Sonnet (20241022) | 39% | — | Imported | 2025-09-11 |
| 32 | Moatless Tools + Claude 3.5 Sonnet (20241022) | 38.33% | — | Imported | 2025-09-11 |
| 33 | Honeycomb | 38.33% | — | Imported | 2025-09-11 |
| 34 | AbanteAI MentatBot + GPT 4o (2024-05-13) | 38% | — | Imported | 2025-09-11 |
| 35 | Patched.Codes Patchwork | 37% | — | Imported | 2025-09-11 |
| 36 | KGCompass + DeepSeek V3 | 36.67% | — | Imported | 2025-09-11 |
| 37 | AppMap Navie v2 | 36% | — | Imported | 2025-09-11 |
| 38 | CodeFuse-AAIS | 35.67% | — | Imported | 2025-09-11 |
| 39 | Gru(2024-08-11) | 35.67% | — | Imported | 2025-09-11 |
| 40 | Isoform | 35% | — | Imported | 2025-09-11 |
| 41 | SuperCoder2.0 | 34% | — | Imported | 2025-09-11 |
| 42 | Bytedance MarsCode Agent + GPT 4o (2024-05-13) | 34% | — | Imported | 2025-09-11 |
| 43 | Alibaba Lingma Agent | 33% | — | Imported | 2025-09-11 |
| 44 | Agentless Lite + O3 Mini (20250214) | 32.33% | — | Imported | 2025-09-11 |
| 45 | Agentless-1.5 + GPT 4o (2024-05-13) | 32% | — | Imported | 2025-09-11 |
| 46 | Factory Code Droid | 31.33% | — | Imported | 2025-09-11 |
| 47 | CodeShellTester + GPT 4o (2024-05-13) | 31.33% | — | Imported | 2025-09-11 |
| 48 | Moatless Tools + Deepseek V3 | 30.67% | — | Imported | 2025-09-11 |
| 49 | AutoCodeRover (v20240620) + GPT 4o (2024-05-13) | 30.67% | — | Imported | 2025-09-11 |
| 50 | Aegis - o3-mini_1.0 | 30.33% | — | Imported | 2025-09-11 |
| 51 | AIGCode Infant-Coder(2024-08-30) | 30% | — | Imported | 2025-09-11 |
| 52 | Kortix AI (claude-3-5-sonnet-20241022) | 30% | — | Imported | 2025-09-11 |
| 53 | Amazon Q Developer Agent (v20240719-dev) | 29.67% | — | Imported | 2025-09-11 |
| 54 | Agentless + RepoGraph + GPT-4o | 29.67% | — | Imported | 2025-09-11 |
| 55 | CodeR + GPT 4 (1106) | 28.33% | — | Imported | 2025-09-11 |
| 56 | reproducedRG | 28% | — | Imported | 2025-09-11 |
| 57 | SIMA + GPT 4o (2024-05-13) | 27.67% | — | Imported | 2025-09-11 |
| 58 | MASAI + GPT 4o (2024-05-13) | 27.33% | — | Imported | 2025-09-11 |
| 59 | Agentless + GPT 4o (2024-05-13) | 27.33% | — | Imported | 2025-09-11 |
| 60 | Moatless Tools + Claude 3.5 Sonnet | 26.67% | — | Imported | 2025-09-11 |
| 61 | OpenHands + CodeAct v1.8 | 26.67% | — | Imported | 2025-09-11 |
| 62 | IBM Research Agent-101 | 26.67% | — | Imported | 2025-09-11 |
| 63 | Aider + GPT 4o & Claude 3 Opus | 26.33% | — | Imported | 2025-09-11 |
| 64 | HyperAgent | 25.33% | — | Imported | 2025-09-11 |
| 65 | SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) | 24.67% | — | Imported | 2025-09-11 |
| 66 | Moatless Tools + GPT 4o (2024-05-13) | 24.67% | — | Imported | 2025-09-11 |
| 67 | IBM AI Agent SWE-1.0 (with open LLMs) | 23.67% | — | Imported | 2025-09-11 |
| 68 | OpenCSG StarShip CodeGenAgent + GPT 4 (0613) | 23.67% | — | Imported | 2025-09-11 |
| 69 | SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) 20241128 | 23.33% | — | Imported | 2025-09-11 |
| 70 | SWE-agent + Claude 3.5 Sonnet | 23% | — | Imported | 2025-09-11 |
| 71 | AppMap Navie + GPT 4o (2024-05-13) | 21.67% | — | Imported | 2025-09-11 |
| 72 | Bytedance AutoSE (based on SWE-Agent) + GPT4/GPT4o Mixed (20240828) | 21.67% | — | Imported | 2025-09-11 |
| 73 | Amazon Q Developer Agent (v20240430-dev) | 20.33% | — | Imported | 2025-09-11 |
| 74 | AutoCodeRover (v20240408) + GPT 4 (0125) | 19% | — | Imported | 2025-09-11 |
| 75 | SWE-agent + GPT 4o (2024-05-13) | 18.33% | — | Imported | 2025-09-11 |
| 76 | SWE-agent + GPT 4 (1106) | 18% | — | Imported | 2025-09-11 |
| 77 | MCTS-Refine-7B | 16.33% | — | Imported | 2025-09-11 |
| 78 | SWE-agent + Claude 3 Opus | 11.67% | — | Imported | 2025-09-11 |
| 79 | RAG + Claude 3 Opus | 4.33% | — | Imported | 2025-09-11 |
| 80 | RAG + Claude 2 | 3% | — | Imported | 2025-09-11 |
| 81 | RAG + GPT 4 (1106) | 2.67% | — | Imported | 2025-09-11 |
| 82 | RAG + SWE-Llama 7B | 1.33% | — | Imported | 2025-09-11 |
| 83 | RAG + SWE-Llama 13B | 1% | — | Imported | 2025-09-11 |
| 84 | RAG + ChatGPT 3.5 | 0.33% | — | Imported | 2025-09-11 |
No matching rows.