CodeEditorBench
Code editing benchmark covering debugging, code translation, requirement switching, and code polishing across primary and plus splits.
86rows
win_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Win Rate, Debug Pass Rate, Translation Pass Rate, Requirement Switch Pass Rate, Polishment Score
| Rank | Subject | Win Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4-0613 (plus, Three-shot) | 0.882 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 2 | gpt-4-0613 (plus, Zero-shot) | 0.868 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 3 | gpt-4-0613 (primary, Zero-shot) | 0.855 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 4 | gemini-ultra (primary, Three-shot) | 0.855 | — | Imported | 2026-05-27 |
| 5 | gpt-4-0613 (primary, CoT) | 0.85 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 6 | gpt-4-0613 (primary, Three-shot) | 0.816 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 7 | OpenCodeInterpreter-DS-33B (plus, Zero-shot) | 0.816 | — | Imported | 2026-05-27 |
| 8 | gpt-3.5-turbo-1106 (plus, Three-shot) | 0.803 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 9 | OpenCodeInterpreter-DS-33B (plus, Three-shot) | 0.803 | — | Imported | 2026-05-27 |
| 10 | gpt-4-0613 (plus, CoT) | 0.8 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 11 | OpenCodeInterpreter-DS-33B (primary, Zero-shot) | 0.776 | — | Imported | 2026-05-27 |
| 12 | gpt-3.5-turbo-1106 (plus, Zero-shot) | 0.776 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 13 | OpenCodeInterpreter-DS-6.7B (primary, Three-shot) | 0.77 | — | Imported | 2026-05-27 |
| 14 | OpenCodeInterpreter-DS-33B (primary, Three-shot) | 0.763 | — | Imported | 2026-05-27 |
| 15 | deepseek-coder-33B-instruct (plus, Three-shot) | 0.763 | — | Imported | 2026-05-27 |
| 16 | deepseek-coder-33B-instruct (plus, Zero-shot) | 0.757 | — | Imported | 2026-05-27 |
| 17 | gemini-ultra (primary, Zero-shot) | 0.75 | — | Imported | 2026-05-27 |
| 18 | glm-4 (primary, CoT) | 0.75 | — | Imported | 2026-05-27 |
| 19 | OpenCodeInterpreter-DS-6.7B (plus, Three-shot) | 0.75 | — | Imported | 2026-05-27 |
| 20 | deepseek-coder-33B-instruct (primary, Zero-shot) | 0.737 | — | Imported | 2026-05-27 |
| 21 | gemini-pro (primary, Zero-shot) | 0.737 | — | Imported | 2026-05-27 |
| 22 | deepseek-coder-33B-instruct (primary, Three-shot) | 0.737 | — | Imported | 2026-05-27 |
| 23 | gpt-3.5-turbo-1106 (primary, Zero-shot) | 0.724 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 24 | gemini-pro (plus, Zero-shot) | 0.711 | — | Imported | 2026-05-27 |
| 25 | WizardCoder-33B-V1.1 (plus, Three-shot) | 0.711 | — | Imported | 2026-05-27 |
| 26 | WizardCoder-33B-V1.1 (plus, Zero-shot) | 0.704 | — | Imported | 2026-05-27 |
| 27 | gpt-3.5-turbo-1106 (plus, CoT) | 0.7 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 28 | Magicoder-S-DS-6.7B (plus, Zero-shot) | 0.697 | — | Imported | 2026-05-27 |
| 29 | OpenCodeInterpreter-DS-6.7B (plus, Zero-shot) | 0.697 | — | Imported | 2026-05-27 |
| 30 | gpt-3.5-turbo-1106 (primary, Three-shot) | 0.684 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 31 | OpenCodeInterpreter-DS-6.7B (primary, Zero-shot) | 0.671 | — | Imported | 2026-05-27 |
| 32 | gemini-pro (primary, Three-shot) | 0.671 | — | Imported | 2026-05-27 |
| 33 | WizardCoder-33B-V1.1 (primary, Three-shot) | 0.645 | — | Imported | 2026-05-27 |
| 34 | gemini-pro (plus, Three-shot) | 0.645 | — | Imported | 2026-05-27 |
| 35 | WizardCoder-33B-V1.1 (primary, Zero-shot) | 0.632 | — | Imported | 2026-05-27 |
| 36 | Magicoder-S-DS-6.7B (plus, Three-shot) | 0.632 | — | Imported | 2026-05-27 |
| 37 | gemini-ultra (plus, Three-shot) | 0.632 | — | Imported | 2026-05-27 |
| 38 | Magicoder-S-DS-6.7B (primary, Three-shot) | 0.605 | — | Imported | 2026-05-27 |
| 39 | glm-4 (plus, CoT) | 0.6 | — | Imported | 2026-05-27 |
| 40 | glm-4 (plus, Zero-shot) | 0.592 | — | Imported | 2026-05-27 |
| 41 | glm-4 (plus, Three-shot) | 0.592 | — | Imported | 2026-05-27 |
| 42 | gemini-ultra (plus, Zero-shot) | 0.579 | — | Imported | 2026-05-27 |
| 43 | glm-4 (primary, Three-shot) | 0.572 | — | Imported | 2026-05-27 |
| 44 | Phind-CodeLlama-34B-v2 (plus, Zero-shot) | 0.539 | — | Imported | 2026-05-27 |
| 45 | glm-4 (primary, Zero-shot) | 0.526 | — | Imported | 2026-05-27 |
| 46 | Magicoder-S-DS-6.7B (primary, Zero-shot) | 0.513 | — | Imported | 2026-05-27 |
| 47 | Phind-CodeLlama-34B-v2 (primary, Zero-shot) | 0.5 | — | Imported | 2026-05-27 |
| 48 | gpt-3.5-turbo-1106 (primary, CoT) | 0.5 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-27 |
| 49 | gemini-ultra (primary, CoT) | 0.5 | — | Imported | 2026-05-27 |
| 50 | gemini-ultra (plus, CoT) | 0.5 | — | Imported | 2026-05-27 |
| 51 | CodeLlama-34B-hf (primary, Three-shot) | 0.474 | — | Imported | 2026-05-27 |
| 52 | Phind-CodeLlama-34B-v2 (plus, Three-shot) | 0.461 | — | Imported | 2026-05-27 |
| 53 | CodeLlama-34B-hf (plus, Three-shot) | 0.447 | — | Imported | 2026-05-27 |
| 54 | octocoder (primary, Zero-shot) | 0.434 | — | Imported | 2026-05-27 |
| 55 | CodeLlama-13B-Instruct-hf (primary, Zero-shot) | 0.421 | — | Imported | 2026-05-27 |
| 56 | Phind-CodeLlama-34B-v2 (primary, Three-shot) | 0.421 | — | Imported | 2026-05-27 |
| 57 | CodeLlama-13B-Instruct-hf (primary, Three-shot) | 0.414 | — | Imported | 2026-05-27 |
| 58 | WizardCoder-15B-V1.0 (plus, Zero-shot) | 0.408 | — | Imported | 2026-05-27 |
| 59 | gemini-pro (primary, CoT) | 0.4 | — | Imported | 2026-05-27 |
| 60 | gemini-pro (plus, CoT) | 0.4 | — | Imported | 2026-05-27 |
| 61 | CodeLlama-34B-hf (primary, Zero-shot) | 0.382 | — | Imported | 2026-05-27 |
| 62 | Magicoder-S-CL-7B (plus, Three-shot) | 0.382 | — | Imported | 2026-05-27 |
| 63 | CodeLlama-13B-Instruct-hf (plus, Zero-shot) | 0.368 | — | Imported | 2026-05-27 |
| 64 | Magicoder-S-CL-7B (plus, Zero-shot) | 0.342 | — | Imported | 2026-05-27 |
| 65 | Magicoder-S-CL-7B (primary, Zero-shot) | 0.329 | — | Imported | 2026-05-27 |
| 66 | WizardCoder-15B-V1.0 (primary, Zero-shot) | 0.329 | — | Imported | 2026-05-27 |
| 67 | Magicoder-S-CL-7B (primary, Three-shot) | 0.329 | — | Imported | 2026-05-27 |
| 68 | CodeLlama-34B-hf (plus, Zero-shot) | 0.329 | — | Imported | 2026-05-27 |
| 69 | WizardCoder-15B-V1.0 (plus, Three-shot) | 0.329 | — | Imported | 2026-05-27 |
| 70 | WizardCoder-15B-V1.0 (primary, Three-shot) | 0.322 | — | Imported | 2026-05-27 |
| 71 | CodeLlama-13B-Instruct-hf (plus, Three-shot) | 0.322 | — | Imported | 2026-05-27 |
| 72 | CodeLlama-7B-Instruct-hf (primary, Zero-shot) | 0.289 | — | Imported | 2026-05-27 |
| 73 | CodeFuse-CodeLlama-34B (primary, Three-shot) | 0.289 | — | Imported | 2026-05-27 |
| 74 | octocoder (plus, Zero-shot) | 0.289 | — | Imported | 2026-05-27 |
| 75 | CodeLlama-7B-Instruct-hf (plus, Zero-shot) | 0.25 | — | Imported | 2026-05-27 |
| 76 | CodeFuse-CodeLlama-34B (plus, Three-shot) | 0.25 | — | Imported | 2026-05-27 |
| 77 | CodeLlama-34B-Instruct-hf (primary, Zero-shot) | 0.211 | — | Imported | 2026-05-27 |
| 78 | octocoder (primary, Three-shot) | 0.211 | — | Imported | 2026-05-27 |
| 79 | CodeLlama-7B-Instruct-hf (primary, Three-shot) | 0.211 | — | Imported | 2026-05-27 |
| 80 | CodeLlama-34B-Instruct-hf (primary, Three-shot) | 0.211 | — | Imported | 2026-05-27 |
| 81 | CodeLlama-34B-Instruct-hf (plus, Three-shot) | 0.211 | — | Imported | 2026-05-27 |
| 82 | CodeLlama-7B-Instruct-hf (plus, Three-shot) | 0.204 | — | Imported | 2026-05-27 |
| 83 | CodeFuse-CodeLlama-34B (primary, Zero-shot) | 0.184 | — | Imported | 2026-05-27 |
| 84 | octocoder (plus, Three-shot) | 0.184 | — | Imported | 2026-05-27 |
| 85 | CodeLlama-34B-Instruct-hf (plus, Zero-shot) | 0.171 | — | Imported | 2026-05-27 |
| 86 | CodeFuse-CodeLlama-34B (plus, Zero-shot) | 0.105 | — | Imported | 2026-05-27 |
No matching rows.