Aider Polyglot

Aider polyglot coding-agent leaderboard over 225 Exercism tasks across C++, Go, Java, JavaScript, Python, and Rust.

69rows
percent_correctprimary metric
2026-05-27sampled

Metadata

Metrics

Percent Correct, Pass Rate 1, Pass Rate 2, Seconds Per Case (lower is better), Total Cost (lower is better)

Latest Results

Rows parsed from Aider's public polyglot leaderboard YAML. Rows represent benchmark runs across 225 Exercism coding exercises; primary score uses pass_rate_2/percent correct.

Rank Subject Percent Correct Model Match Provenance Sampled
1 gpt-5 (high) (diff, 2025-08-23) 88 Imported 2026-05-27
2 gpt-5 (medium) (diff, 2025-08-25) 86.7 Imported 2026-05-27
3 o3-pro (high) (diff, 2025-06-28) 84.9 Imported 2026-05-27
4 gemini-2.5-pro-preview-06-05 (32k think) (diff-fenced, 2025-06-06) 83.1 Imported 2026-05-27
5 o3 (high) (diff, 2025-06-25) 81.3 Imported 2026-05-27
6 gpt-5 (low) (diff, 2025-08-25) 81.3 Imported 2026-05-27
7 grok-4 (high) (diff, 2025-07-11) 79.6 Imported 2026-05-27
8 gemini-2.5-pro-preview-06-05 (default think) (diff-fenced, 2025-06-06) 79.1 Imported 2026-05-27
9 o3 (high) + gpt-4.1 (architect, 2025-06-27) 78.2 Imported 2026-05-27
10 Gemini 2.5 Pro Preview 05-06 (diff-fenced, 2025-05-07) 76.9 Imported 2026-05-27
11 o3 (diff, 2025-06-25) 76.9 Imported 2026-05-27
12 DeepSeek-V3.2-Exp (Reasoner) (diff, 2025-10-03) 74.2 Imported 2026-05-27
13 Gemini 2.5 Pro Preview 03-25 (diff-fenced, 2025-04-12) 72.9 Imported 2026-05-27
14 o4-mini (high) (diff, 2025-04-16) 72 Imported 2026-05-27
15 claude-opus-4-20250514 (32k thinking) (diff, 2025-05-25) 72 Imported 2026-05-27
16 DeepSeek R1 (0528) (diff, 2025-06-06) 71.4 Imported 2026-05-27
17 claude-opus-4-20250514 (no think) (diff, 2025-05-25) 70.7 Imported 2026-05-27
18 DeepSeek-V3.2-Exp (Chat) (diff, 2025-10-03) 70.2 Imported 2026-05-27
19 claude-3-7-sonnet-20250219 (32k thinking tokens) (diff, 2025-02-24) 64.9 Imported 2026-05-27
20 DeepSeek R1 + claude-3-5-sonnet-20241022 (architect, 2025-01-23) 64 Imported 2026-05-27
21 o1-2024-12-17 (high) (diff, 2024-12-21) 61.7 Imported 2026-05-27
22 claude-sonnet-4-20250514 (32k thinking) (diff, 2025-05-24) 61.3 Imported 2026-05-27
23 o3-mini (high) (diff, 2025-01-31) 60.4 Imported 2026-05-27
24 claude-3-7-sonnet-20250219 (no thinking) (diff, 2025-02-24) 60.4 Imported 2026-05-27
25 Qwen3 235B A22B diff, no think, Alibaba API (diff, 2025-05-09) 59.6 Imported 2026-05-27
26 Kimi K2 (diff, 2025-07-17) 59.1 Imported 2026-05-27
27 DeepSeek R1 (diff, 2025-01-20) 56.9 Imported 2026-05-27
28 claude-sonnet-4-20250514 (no thinking) (diff, 2025-05-24) 56.4 Imported 2026-05-27
29 DeepSeek V3 (0324) (diff, 2025-03-24) 55.1 Imported 2026-05-27
30 gemini-2.5-flash-preview-05-20 (24k think) (diff, 2025-05-25) 55.1 Imported 2026-05-27
31 Quasar Alpha (diff, 2025-04-04) 54.7 Imported 2026-05-27
32 o3-mini (medium) (diff, 2025-01-31) 53.8 Imported 2026-05-27
33 Grok 3 Beta (diff, 2025-04-10) 53.3 Imported 2026-05-27
34 Optimus Alpha (diff, 2025-04-10) 52.9 Imported 2026-05-27
35 gpt-4.1 (diff, 2025-04-14) 52.4 Imported 2026-05-27
36 claude-3-5-sonnet-20241022 (diff, 2025-01-17) 51.6 Imported 2026-05-27
37 Grok 3 Mini Beta (high) (whole, 2025-04-10) 49.3 Imported 2026-05-27
38 DeepSeek Chat V3 (prev) (diff, 2024-12-25) 48.4 Imported 2026-05-27
39 gemini-2.5-flash-preview-04-17 (default) (diff, 2025-04-20) 47.1 Imported 2026-05-27
40 chatgpt-4o-latest (2025-03-29) (diff, 2025-03-29) 45.3 Imported 2026-05-27
41 gpt-4.5-preview (diff, 2025-02-27) 44.9 Imported 2026-05-27
42 gemini-2.5-flash-preview-05-20 (no think) (diff, 2025-05-26) 44 Imported 2026-05-27
43 gpt-oss-120b (high) (diff, 2025-08-06) 41.8 Imported 2026-05-27
44 Qwen3 32B (diff, 2025-05-08) 40 Imported 2026-05-27
45 gemini-exp-1206 (whole, 2024-12-22) 38.2 Imported 2026-05-27
46 Gemini 2.0 Pro exp-02-05 (whole, 2025-02-25) 35.6 Imported 2026-05-27
47 Grok 3 Mini Beta (low) (whole, 2025-04-10) 34.7 Imported 2026-05-27
48 o1-mini-2024-09-12 (whole, 2024-12-22) 32.9 Imported 2026-05-27
49 gpt-4.1-mini (diff, 2025-04-14) 32.4 Imported 2026-05-27
50 claude-3-5-haiku-20241022 (diff, 2024-12-21) 28 Imported 2026-05-27
51 chatgpt-4o-latest (2025-02-15) (diff, 2025-02-15) 27.1 Imported 2026-05-27
52 QwQ-32B + Qwen 2.5 Coder Instruct (architect, 2025-03-07) 26.2 Imported 2026-05-27
53 gpt-4o-2024-08-06 (diff, 2024-12-30) 23.1 Imported 2026-05-27
54 gemini-2.0-flash-exp (whole, 2024-12-22) 22.2 Imported 2026-05-27
55 qwen-max-2025-01-25 (diff, 2025-01-28) 21.8 Imported 2026-05-27
56 QwQ-32B (diff, 2025-03-06) 20.9 Imported 2026-05-27
57 gpt-4o-2024-11-20 (diff, 2024-12-30) 18.2 Imported 2026-05-27
58 gemini-2.0-flash-thinking-exp-01-21 (diff, 2025-01-21) 18.2 Imported 2026-05-27
59 DeepSeek Chat V2.5 (diff, 2024-12-21) 17.8 Imported 2026-05-27
60 Qwen2.5-Coder-32B-Instruct (whole, 2024-12-26) 16.4 Imported 2026-05-27
61 Llama 4 Maverick (whole, 2025-04-06) 15.6 Imported 2026-05-27
62 yi-lightning (whole, 2024-12-23) 12.9 Imported 2026-05-27
63 command-a-03-2025-quality (whole, 2025-03-14) 12 Imported 2026-05-27
64 Codestral 25.01 (whole, 2025-01-13) 11.1 Imported 2026-05-27
65 openhands-lm-32b-v0.1 (whole, 2025-04-19) 10.2 Imported 2026-05-27
66 gpt-4.1-nano (whole, 2025-04-14) 8.9 Imported 2026-05-27
67 Qwen2.5-Coder-32B-Instruct (diff, 2024-12-22) 8 Imported 2026-05-27
68 gemma-3-27b-it (whole, 2025-03-15) 4.9 Imported 2026-05-27
69 gpt-4o-mini-2024-07-18 (whole, 2024-12-21) 3.6 Imported 2026-05-27