Aider Polyglot
Aider polyglot coding-agent leaderboard over 225 Exercism tasks across C++, Go, Java, JavaScript, Python, and Rust.
69rows
percent_correctprimary metric
2026-05-27sampled
Metadata
Metrics
Percent Correct, Pass Rate 1, Pass Rate 2, Seconds Per Case (lower is better), Total Cost (lower is better)
| Rank | Subject | Percent Correct | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-5 (high) (diff, 2025-08-23) | 88 | — | Imported | 2026-05-27 |
| 2 | gpt-5 (medium) (diff, 2025-08-25) | 86.7 | — | Imported | 2026-05-27 |
| 3 | o3-pro (high) (diff, 2025-06-28) | 84.9 | — | Imported | 2026-05-27 |
| 4 | gemini-2.5-pro-preview-06-05 (32k think) (diff-fenced, 2025-06-06) | 83.1 | — | Imported | 2026-05-27 |
| 5 | o3 (high) (diff, 2025-06-25) | 81.3 | — | Imported | 2026-05-27 |
| 6 | gpt-5 (low) (diff, 2025-08-25) | 81.3 | — | Imported | 2026-05-27 |
| 7 | grok-4 (high) (diff, 2025-07-11) | 79.6 | — | Imported | 2026-05-27 |
| 8 | gemini-2.5-pro-preview-06-05 (default think) (diff-fenced, 2025-06-06) | 79.1 | — | Imported | 2026-05-27 |
| 9 | o3 (high) + gpt-4.1 (architect, 2025-06-27) | 78.2 | — | Imported | 2026-05-27 |
| 10 | Gemini 2.5 Pro Preview 05-06 (diff-fenced, 2025-05-07) | 76.9 | — | Imported | 2026-05-27 |
| 11 | o3 (diff, 2025-06-25) | 76.9 | — | Imported | 2026-05-27 |
| 12 | DeepSeek-V3.2-Exp (Reasoner) (diff, 2025-10-03) | 74.2 | — | Imported | 2026-05-27 |
| 13 | Gemini 2.5 Pro Preview 03-25 (diff-fenced, 2025-04-12) | 72.9 | — | Imported | 2026-05-27 |
| 14 | o4-mini (high) (diff, 2025-04-16) | 72 | — | Imported | 2026-05-27 |
| 15 | claude-opus-4-20250514 (32k thinking) (diff, 2025-05-25) | 72 | — | Imported | 2026-05-27 |
| 16 | DeepSeek R1 (0528) (diff, 2025-06-06) | 71.4 | — | Imported | 2026-05-27 |
| 17 | claude-opus-4-20250514 (no think) (diff, 2025-05-25) | 70.7 | — | Imported | 2026-05-27 |
| 18 | DeepSeek-V3.2-Exp (Chat) (diff, 2025-10-03) | 70.2 | — | Imported | 2026-05-27 |
| 19 | claude-3-7-sonnet-20250219 (32k thinking tokens) (diff, 2025-02-24) | 64.9 | — | Imported | 2026-05-27 |
| 20 | DeepSeek R1 + claude-3-5-sonnet-20241022 (architect, 2025-01-23) | 64 | — | Imported | 2026-05-27 |
| 21 | o1-2024-12-17 (high) (diff, 2024-12-21) | 61.7 | — | Imported | 2026-05-27 |
| 22 | claude-sonnet-4-20250514 (32k thinking) (diff, 2025-05-24) | 61.3 | — | Imported | 2026-05-27 |
| 23 | o3-mini (high) (diff, 2025-01-31) | 60.4 | — | Imported | 2026-05-27 |
| 24 | claude-3-7-sonnet-20250219 (no thinking) (diff, 2025-02-24) | 60.4 | — | Imported | 2026-05-27 |
| 25 | Qwen3 235B A22B diff, no think, Alibaba API (diff, 2025-05-09) | 59.6 | — | Imported | 2026-05-27 |
| 26 | Kimi K2 (diff, 2025-07-17) | 59.1 | — | Imported | 2026-05-27 |
| 27 | DeepSeek R1 (diff, 2025-01-20) | 56.9 | — | Imported | 2026-05-27 |
| 28 | claude-sonnet-4-20250514 (no thinking) (diff, 2025-05-24) | 56.4 | — | Imported | 2026-05-27 |
| 29 | DeepSeek V3 (0324) (diff, 2025-03-24) | 55.1 | — | Imported | 2026-05-27 |
| 30 | gemini-2.5-flash-preview-05-20 (24k think) (diff, 2025-05-25) | 55.1 | — | Imported | 2026-05-27 |
| 31 | Quasar Alpha (diff, 2025-04-04) | 54.7 | — | Imported | 2026-05-27 |
| 32 | o3-mini (medium) (diff, 2025-01-31) | 53.8 | — | Imported | 2026-05-27 |
| 33 | Grok 3 Beta (diff, 2025-04-10) | 53.3 | — | Imported | 2026-05-27 |
| 34 | Optimus Alpha (diff, 2025-04-10) | 52.9 | — | Imported | 2026-05-27 |
| 35 | gpt-4.1 (diff, 2025-04-14) | 52.4 | — | Imported | 2026-05-27 |
| 36 | claude-3-5-sonnet-20241022 (diff, 2025-01-17) | 51.6 | — | Imported | 2026-05-27 |
| 37 | Grok 3 Mini Beta (high) (whole, 2025-04-10) | 49.3 | — | Imported | 2026-05-27 |
| 38 | DeepSeek Chat V3 (prev) (diff, 2024-12-25) | 48.4 | — | Imported | 2026-05-27 |
| 39 | gemini-2.5-flash-preview-04-17 (default) (diff, 2025-04-20) | 47.1 | — | Imported | 2026-05-27 |
| 40 | chatgpt-4o-latest (2025-03-29) (diff, 2025-03-29) | 45.3 | — | Imported | 2026-05-27 |
| 41 | gpt-4.5-preview (diff, 2025-02-27) | 44.9 | — | Imported | 2026-05-27 |
| 42 | gemini-2.5-flash-preview-05-20 (no think) (diff, 2025-05-26) | 44 | — | Imported | 2026-05-27 |
| 43 | gpt-oss-120b (high) (diff, 2025-08-06) | 41.8 | — | Imported | 2026-05-27 |
| 44 | Qwen3 32B (diff, 2025-05-08) | 40 | — | Imported | 2026-05-27 |
| 45 | gemini-exp-1206 (whole, 2024-12-22) | 38.2 | — | Imported | 2026-05-27 |
| 46 | Gemini 2.0 Pro exp-02-05 (whole, 2025-02-25) | 35.6 | — | Imported | 2026-05-27 |
| 47 | Grok 3 Mini Beta (low) (whole, 2025-04-10) | 34.7 | — | Imported | 2026-05-27 |
| 48 | o1-mini-2024-09-12 (whole, 2024-12-22) | 32.9 | — | Imported | 2026-05-27 |
| 49 | gpt-4.1-mini (diff, 2025-04-14) | 32.4 | — | Imported | 2026-05-27 |
| 50 | claude-3-5-haiku-20241022 (diff, 2024-12-21) | 28 | — | Imported | 2026-05-27 |
| 51 | chatgpt-4o-latest (2025-02-15) (diff, 2025-02-15) | 27.1 | — | Imported | 2026-05-27 |
| 52 | QwQ-32B + Qwen 2.5 Coder Instruct (architect, 2025-03-07) | 26.2 | — | Imported | 2026-05-27 |
| 53 | gpt-4o-2024-08-06 (diff, 2024-12-30) | 23.1 | — | Imported | 2026-05-27 |
| 54 | gemini-2.0-flash-exp (whole, 2024-12-22) | 22.2 | — | Imported | 2026-05-27 |
| 55 | qwen-max-2025-01-25 (diff, 2025-01-28) | 21.8 | — | Imported | 2026-05-27 |
| 56 | QwQ-32B (diff, 2025-03-06) | 20.9 | — | Imported | 2026-05-27 |
| 57 | gpt-4o-2024-11-20 (diff, 2024-12-30) | 18.2 | — | Imported | 2026-05-27 |
| 58 | gemini-2.0-flash-thinking-exp-01-21 (diff, 2025-01-21) | 18.2 | — | Imported | 2026-05-27 |
| 59 | DeepSeek Chat V2.5 (diff, 2024-12-21) | 17.8 | — | Imported | 2026-05-27 |
| 60 | Qwen2.5-Coder-32B-Instruct (whole, 2024-12-26) | 16.4 | — | Imported | 2026-05-27 |
| 61 | Llama 4 Maverick (whole, 2025-04-06) | 15.6 | — | Imported | 2026-05-27 |
| 62 | yi-lightning (whole, 2024-12-23) | 12.9 | — | Imported | 2026-05-27 |
| 63 | command-a-03-2025-quality (whole, 2025-03-14) | 12 | — | Imported | 2026-05-27 |
| 64 | Codestral 25.01 (whole, 2025-01-13) | 11.1 | — | Imported | 2026-05-27 |
| 65 | openhands-lm-32b-v0.1 (whole, 2025-04-19) | 10.2 | — | Imported | 2026-05-27 |
| 66 | gpt-4.1-nano (whole, 2025-04-14) | 8.9 | — | Imported | 2026-05-27 |
| 67 | Qwen2.5-Coder-32B-Instruct (diff, 2024-12-22) | 8 | — | Imported | 2026-05-27 |
| 68 | gemma-3-27b-it (whole, 2025-03-15) | 4.9 | — | Imported | 2026-05-27 |
| 69 | gpt-4o-mini-2024-07-18 (whole, 2024-12-21) | 3.6 | — | Imported | 2026-05-27 |
No matching rows.