SWE-PolyBench
Polyglot software-engineering benchmark with Java, Python, JavaScript, and TypeScript tasks plus retrieval/localization diagnostics.
Metadata
Metrics
Resolve Rate, Resolve Rate - Java, Resolve Rate - Python, Resolve Rate - JavaScript, Resolve Rate - TypeScript, Resolve Rate - Single Function, Resolve Rate - Function Only, Resolve Rate - Single Class, Resolve Rate - Class Only, Resolve Rate - No Local Context, Resolve Rate - Mixed Context, File Precision - All, File Recall - All, File Precision - Python, File Recall - Python, File Precision - Java, File Recall - Java, File Precision - JavaScript, File Recall - JavaScript, File Precision - TypeScript, File Recall - TypeScript, Node Precision - All, Node Recall - All, Node Precision - Python, Node Recall - Python, Node Precision - Java, Node Recall - Java, Node Precision - JavaScript, Node Recall - JavaScript, Node Precision - TypeScript, Node Recall - TypeScript
| Rank | Subject | Resolve Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Atlassian Rovo Dev (Verified) | 48.9529 | — | Imported | 2026-05-27 |
| 2 | PrometheusV1.2 + GPT-5 (Verified) | 33.7696 | — | Imported | 2026-05-27 |
| 3 | Amazon Q Developer Agent (v20240402) (Verified) | 28.7958 | — | Imported | 2026-05-27 |
| 4 | Kodah (gpt-5-mini) (Verified) | 28.2723 | — | Imported | 2026-05-27 |
| 5 | Amazon Q Developer Agent (v20250402) (PolyBench500) | 25 | — | Imported | 2026-05-27 |
| 6 | Amazon Q Developer Agent (v20250402) (Full) | 22.6066 | — | Imported | 2026-05-27 |
| 7 | Aider-PB (Sonnet 3.5) (PolyBench500) | 16.4 | — | Imported | 2026-05-27 |
| 8 | Aider-PB (Sonnet 3.5) (Verified) | 16.2304 | — | Imported | 2026-05-27 |
| 9 | SWE-agent-PB (Sonnet 3.5) (PolyBench500) | 15.4 | — | Imported | 2026-05-27 |
| 10 | SWE-agent-PB (Sonnet 3.5) (Verified) | 14.3979 | — | Imported | 2026-05-27 |
| 11 | Aider-PB (Sonnet 3.5) (Full) | 14.0758 | — | Imported | 2026-05-27 |
| 12 | Aider-PB (Deepseek R1) (Verified) | 13.8743 | — | Imported | 2026-05-27 |
| 13 | Agentless-PB (Sonnet 3.5) (Verified) | 13.3508 | — | Imported | 2026-05-27 |
| 14 | Aider-PB (Deepseek R1) (PolyBench500) | 13.2 | — | Imported | 2026-05-27 |
| 15 | Aider-PB (Haiku) (Verified) | 13.089 | — | Imported | 2026-05-27 |
| 16 | Aider-PB (Deepseek R1) (Full) | 11.5166 | — | Imported | 2026-05-27 |
| 17 | Aider-PB (Haiku) (PolyBench500) | 11.2 | — | Imported | 2026-05-27 |
| 18 | Agentless-PB (Sonnet 3.5) (PolyBench500) | 10.8 | — | Imported | 2026-05-27 |
| 19 | SWE-agent-PB (Sonnet 3.5) (Full) | 10.1896 | — | Imported | 2026-05-27 |
| 20 | Aider-PB (Haiku) (Full) | 9.8578 | — | Imported | 2026-05-27 |
| 21 | Aider-PB (Llama3.3 70B) (Verified) | 8.6387 | — | Imported | 2026-05-27 |
| 22 | Aider-PB (Mistral-Large) (Verified) | 8.377 | — | Imported | 2026-05-27 |
| 23 | Agentless-PB (Sonnet 3.5) (Full) | 7.8199 | — | Imported | 2026-05-27 |
| 24 | Aider-PB (Deepseek-R1-Distill Llama 70B) (Verified) | 7.5916 | — | Imported | 2026-05-27 |
| 25 | Aider-PB (Llama3.3 70B) (PolyBench500) | 7.4 | — | Imported | 2026-05-27 |
| 26 | Aider-PB (Mistral-Large) (PolyBench500) | 6.8 | — | Imported | 2026-05-27 |
| 27 | Aider-PB (Llama3.3 70B) (Full) | 6.019 | — | Imported | 2026-05-27 |
| 28 | Aider-PB (Deepseek-R1-Distill Llama 70B) (PolyBench500) | 6 | — | Imported | 2026-05-27 |
| 29 | Aider-PB (Mistral-Large) (Full) | 5.8768 | — | Imported | 2026-05-27 |
| 30 | Aider-PB (Deepseek-R1-Distill Llama 70B) (Full) | 5.3081 | — | Imported | 2026-05-27 |
No matching rows.