SWE-PolyBench

Polyglot software-engineering benchmark with Java, Python, JavaScript, and TypeScript tasks plus retrieval/localization diagnostics.

30rows
resolvedprimary metric
2026-05-27sampled

Metadata

Metrics

Resolve Rate, Resolve Rate - Java, Resolve Rate - Python, Resolve Rate - JavaScript, Resolve Rate - TypeScript, Resolve Rate - Single Function, Resolve Rate - Function Only, Resolve Rate - Single Class, Resolve Rate - Class Only, Resolve Rate - No Local Context, Resolve Rate - Mixed Context, File Precision - All, File Recall - All, File Precision - Python, File Recall - Python, File Precision - Java, File Recall - Java, File Precision - JavaScript, File Recall - JavaScript, File Precision - TypeScript, File Recall - TypeScript, Node Precision - All, Node Recall - All, Node Precision - Python, Node Recall - Python, Node Precision - Java, Node Recall - Java, Node Precision - JavaScript, Node Recall - JavaScript, Node Precision - TypeScript, Node Recall - TypeScript

Latest Results

Rows parsed from SWE-PolyBench public static leaderboard tables for Full, PolyBench500, and Verified splits. Each agent-split pair is retained as a separate row.

Rank Subject Resolve Rate Model Match Provenance Sampled
1 Atlassian Rovo Dev (Verified) 48.9529 Imported 2026-05-27
2 PrometheusV1.2 + GPT-5 (Verified) 33.7696 Imported 2026-05-27
3 Amazon Q Developer Agent (v20240402) (Verified) 28.7958 Imported 2026-05-27
4 Kodah (gpt-5-mini) (Verified) 28.2723 Imported 2026-05-27
5 Amazon Q Developer Agent (v20250402) (PolyBench500) 25 Imported 2026-05-27
6 Amazon Q Developer Agent (v20250402) (Full) 22.6066 Imported 2026-05-27
7 Aider-PB (Sonnet 3.5) (PolyBench500) 16.4 Imported 2026-05-27
8 Aider-PB (Sonnet 3.5) (Verified) 16.2304 Imported 2026-05-27
9 SWE-agent-PB (Sonnet 3.5) (PolyBench500) 15.4 Imported 2026-05-27
10 SWE-agent-PB (Sonnet 3.5) (Verified) 14.3979 Imported 2026-05-27
11 Aider-PB (Sonnet 3.5) (Full) 14.0758 Imported 2026-05-27
12 Aider-PB (Deepseek R1) (Verified) 13.8743 Imported 2026-05-27
13 Agentless-PB (Sonnet 3.5) (Verified) 13.3508 Imported 2026-05-27
14 Aider-PB (Deepseek R1) (PolyBench500) 13.2 Imported 2026-05-27
15 Aider-PB (Haiku) (Verified) 13.089 Imported 2026-05-27
16 Aider-PB (Deepseek R1) (Full) 11.5166 Imported 2026-05-27
17 Aider-PB (Haiku) (PolyBench500) 11.2 Imported 2026-05-27
18 Agentless-PB (Sonnet 3.5) (PolyBench500) 10.8 Imported 2026-05-27
19 SWE-agent-PB (Sonnet 3.5) (Full) 10.1896 Imported 2026-05-27
20 Aider-PB (Haiku) (Full) 9.8578 Imported 2026-05-27
21 Aider-PB (Llama3.3 70B) (Verified) 8.6387 Imported 2026-05-27
22 Aider-PB (Mistral-Large) (Verified) 8.377 Imported 2026-05-27
23 Agentless-PB (Sonnet 3.5) (Full) 7.8199 Imported 2026-05-27
24 Aider-PB (Deepseek-R1-Distill Llama 70B) (Verified) 7.5916 Imported 2026-05-27
25 Aider-PB (Llama3.3 70B) (PolyBench500) 7.4 Imported 2026-05-27
26 Aider-PB (Mistral-Large) (PolyBench500) 6.8 Imported 2026-05-27
27 Aider-PB (Llama3.3 70B) (Full) 6.019 Imported 2026-05-27
28 Aider-PB (Deepseek-R1-Distill Llama 70B) (PolyBench500) 6 Imported 2026-05-27
29 Aider-PB (Mistral-Large) (Full) 5.8768 Imported 2026-05-27
30 Aider-PB (Deepseek-R1-Distill Llama 70B) (Full) 5.3081 Imported 2026-05-27