CodeEditorBench

Code editing benchmark covering debugging, code translation, requirement switching, and code polishing across primary and plus splits.

86rows
win_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Win Rate, Debug Pass Rate, Translation Pass Rate, Requirement Switch Pass Rate, Polishment Score

Latest Results

Rows parsed from CodeEditorBench public static leaderboard tables for primary and plus splits. The primary score is source win rate.

Rank Subject Win Rate Model Match Provenance Sampled
1 gpt-4-0613 (plus, Three-shot) 0.882 GPT-4
openai-gpt-4
Imported 2026-05-27
2 gpt-4-0613 (plus, Zero-shot) 0.868 GPT-4
openai-gpt-4
Imported 2026-05-27
3 gpt-4-0613 (primary, Zero-shot) 0.855 GPT-4
openai-gpt-4
Imported 2026-05-27
4 gemini-ultra (primary, Three-shot) 0.855 Imported 2026-05-27
5 gpt-4-0613 (primary, CoT) 0.85 GPT-4
openai-gpt-4
Imported 2026-05-27
6 gpt-4-0613 (primary, Three-shot) 0.816 GPT-4
openai-gpt-4
Imported 2026-05-27
7 OpenCodeInterpreter-DS-33B (plus, Zero-shot) 0.816 Imported 2026-05-27
8 gpt-3.5-turbo-1106 (plus, Three-shot) 0.803 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
9 OpenCodeInterpreter-DS-33B (plus, Three-shot) 0.803 Imported 2026-05-27
10 gpt-4-0613 (plus, CoT) 0.8 GPT-4
openai-gpt-4
Imported 2026-05-27
11 OpenCodeInterpreter-DS-33B (primary, Zero-shot) 0.776 Imported 2026-05-27
12 gpt-3.5-turbo-1106 (plus, Zero-shot) 0.776 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
13 OpenCodeInterpreter-DS-6.7B (primary, Three-shot) 0.77 Imported 2026-05-27
14 OpenCodeInterpreter-DS-33B (primary, Three-shot) 0.763 Imported 2026-05-27
15 deepseek-coder-33B-instruct (plus, Three-shot) 0.763 Imported 2026-05-27
16 deepseek-coder-33B-instruct (plus, Zero-shot) 0.757 Imported 2026-05-27
17 gemini-ultra (primary, Zero-shot) 0.75 Imported 2026-05-27
18 glm-4 (primary, CoT) 0.75 Imported 2026-05-27
19 OpenCodeInterpreter-DS-6.7B (plus, Three-shot) 0.75 Imported 2026-05-27
20 deepseek-coder-33B-instruct (primary, Zero-shot) 0.737 Imported 2026-05-27
21 gemini-pro (primary, Zero-shot) 0.737 Imported 2026-05-27
22 deepseek-coder-33B-instruct (primary, Three-shot) 0.737 Imported 2026-05-27
23 gpt-3.5-turbo-1106 (primary, Zero-shot) 0.724 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
24 gemini-pro (plus, Zero-shot) 0.711 Imported 2026-05-27
25 WizardCoder-33B-V1.1 (plus, Three-shot) 0.711 Imported 2026-05-27
26 WizardCoder-33B-V1.1 (plus, Zero-shot) 0.704 Imported 2026-05-27
27 gpt-3.5-turbo-1106 (plus, CoT) 0.7 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
28 Magicoder-S-DS-6.7B (plus, Zero-shot) 0.697 Imported 2026-05-27
29 OpenCodeInterpreter-DS-6.7B (plus, Zero-shot) 0.697 Imported 2026-05-27
30 gpt-3.5-turbo-1106 (primary, Three-shot) 0.684 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
31 OpenCodeInterpreter-DS-6.7B (primary, Zero-shot) 0.671 Imported 2026-05-27
32 gemini-pro (primary, Three-shot) 0.671 Imported 2026-05-27
33 WizardCoder-33B-V1.1 (primary, Three-shot) 0.645 Imported 2026-05-27
34 gemini-pro (plus, Three-shot) 0.645 Imported 2026-05-27
35 WizardCoder-33B-V1.1 (primary, Zero-shot) 0.632 Imported 2026-05-27
36 Magicoder-S-DS-6.7B (plus, Three-shot) 0.632 Imported 2026-05-27
37 gemini-ultra (plus, Three-shot) 0.632 Imported 2026-05-27
38 Magicoder-S-DS-6.7B (primary, Three-shot) 0.605 Imported 2026-05-27
39 glm-4 (plus, CoT) 0.6 Imported 2026-05-27
40 glm-4 (plus, Zero-shot) 0.592 Imported 2026-05-27
41 glm-4 (plus, Three-shot) 0.592 Imported 2026-05-27
42 gemini-ultra (plus, Zero-shot) 0.579 Imported 2026-05-27
43 glm-4 (primary, Three-shot) 0.572 Imported 2026-05-27
44 Phind-CodeLlama-34B-v2 (plus, Zero-shot) 0.539 Imported 2026-05-27
45 glm-4 (primary, Zero-shot) 0.526 Imported 2026-05-27
46 Magicoder-S-DS-6.7B (primary, Zero-shot) 0.513 Imported 2026-05-27
47 Phind-CodeLlama-34B-v2 (primary, Zero-shot) 0.5 Imported 2026-05-27
48 gpt-3.5-turbo-1106 (primary, CoT) 0.5 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
49 gemini-ultra (primary, CoT) 0.5 Imported 2026-05-27
50 gemini-ultra (plus, CoT) 0.5 Imported 2026-05-27
51 CodeLlama-34B-hf (primary, Three-shot) 0.474 Imported 2026-05-27
52 Phind-CodeLlama-34B-v2 (plus, Three-shot) 0.461 Imported 2026-05-27
53 CodeLlama-34B-hf (plus, Three-shot) 0.447 Imported 2026-05-27
54 octocoder (primary, Zero-shot) 0.434 Imported 2026-05-27
55 CodeLlama-13B-Instruct-hf (primary, Zero-shot) 0.421 Imported 2026-05-27
56 Phind-CodeLlama-34B-v2 (primary, Three-shot) 0.421 Imported 2026-05-27
57 CodeLlama-13B-Instruct-hf (primary, Three-shot) 0.414 Imported 2026-05-27
58 WizardCoder-15B-V1.0 (plus, Zero-shot) 0.408 Imported 2026-05-27
59 gemini-pro (primary, CoT) 0.4 Imported 2026-05-27
60 gemini-pro (plus, CoT) 0.4 Imported 2026-05-27
61 CodeLlama-34B-hf (primary, Zero-shot) 0.382 Imported 2026-05-27
62 Magicoder-S-CL-7B (plus, Three-shot) 0.382 Imported 2026-05-27
63 CodeLlama-13B-Instruct-hf (plus, Zero-shot) 0.368 Imported 2026-05-27
64 Magicoder-S-CL-7B (plus, Zero-shot) 0.342 Imported 2026-05-27
65 Magicoder-S-CL-7B (primary, Zero-shot) 0.329 Imported 2026-05-27
66 WizardCoder-15B-V1.0 (primary, Zero-shot) 0.329 Imported 2026-05-27
67 Magicoder-S-CL-7B (primary, Three-shot) 0.329 Imported 2026-05-27
68 CodeLlama-34B-hf (plus, Zero-shot) 0.329 Imported 2026-05-27
69 WizardCoder-15B-V1.0 (plus, Three-shot) 0.329 Imported 2026-05-27
70 WizardCoder-15B-V1.0 (primary, Three-shot) 0.322 Imported 2026-05-27
71 CodeLlama-13B-Instruct-hf (plus, Three-shot) 0.322 Imported 2026-05-27
72 CodeLlama-7B-Instruct-hf (primary, Zero-shot) 0.289 Imported 2026-05-27
73 CodeFuse-CodeLlama-34B (primary, Three-shot) 0.289 Imported 2026-05-27
74 octocoder (plus, Zero-shot) 0.289 Imported 2026-05-27
75 CodeLlama-7B-Instruct-hf (plus, Zero-shot) 0.25 Imported 2026-05-27
76 CodeFuse-CodeLlama-34B (plus, Three-shot) 0.25 Imported 2026-05-27
77 CodeLlama-34B-Instruct-hf (primary, Zero-shot) 0.211 Imported 2026-05-27
78 octocoder (primary, Three-shot) 0.211 Imported 2026-05-27
79 CodeLlama-7B-Instruct-hf (primary, Three-shot) 0.211 Imported 2026-05-27
80 CodeLlama-34B-Instruct-hf (primary, Three-shot) 0.211 Imported 2026-05-27
81 CodeLlama-34B-Instruct-hf (plus, Three-shot) 0.211 Imported 2026-05-27
82 CodeLlama-7B-Instruct-hf (plus, Three-shot) 0.204 Imported 2026-05-27
83 CodeFuse-CodeLlama-34B (primary, Zero-shot) 0.184 Imported 2026-05-27
84 octocoder (plus, Three-shot) 0.184 Imported 2026-05-27
85 CodeLlama-34B-Instruct-hf (plus, Zero-shot) 0.171 Imported 2026-05-27
86 CodeFuse-CodeLlama-34B (plus, Zero-shot) 0.105 Imported 2026-05-27