McEval

McEval: Measures model capability on programming, code generation, code repair, or repository-level software tasks.

22rows
avgprimary metric
2026-05-27sampled

Metadata

Metrics

AVG, AWK, C, C++, C#, Clisp, Coffee, Dart, Elisp, Elixir, Erlang, Fortran, F#, Go, Groovy, Haskell, Html, Java, JS, Json, Julia, Kotlin, Lua, MD, Pascal, Perl, PHP, Power, Python, R, Racket, Ruby, Rust, Scala, Scheme, Shell, Swift, Tcl, TS, VB, VimL

Latest Results

Rows are parsed from the McEval public leaderboard static JavaScript bundle; this snapshot uses the default completion tab.

Rank Subject AVG Model Match Provenance Sampled
1 GPT-4o-240513 65.2% GPT-4o
openai-gpt-4o
Imported 2026-05-27
2 GPT-4-Turbo-231106 63.4% GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-27
3 DeepSeek-Coder-33b-instruct 54.3% Imported 2026-05-27
4 GPT-3.5-Turbo-240125 52.6% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
5 Codestral-22B-v0.1 50.5% Imported 2026-05-27
6 Magicoder-S-DS-6.7B 48.6% Imported 2026-05-27
7 Yi-Large-Turbo 46.6% Imported 2026-05-27
8 DeepSeek-Coder-1.5-7b-instruct 46% Imported 2026-05-27
9 OpenCodeInterpreter-DS-6.7B 46% Imported 2026-05-27
10 CodeQwen-1.5-7b 45.5% Imported 2026-05-27
11 Nxcode-CQ-7B-orpo 44.7% Imported 2026-05-27
12 WizardCoder-Python-34B 36.5% Imported 2026-05-27
13 Llama-3-8B-Instruct 36% Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-27
14 Qwen1.5-72B-Chat 35.8% Imported 2026-05-27
15 Phi-3-medium-4k-instruct 35.2% Imported 2026-05-27
16 Codegemma-7b-it 30.7% Imported 2026-05-27
17 CodeLlama-34b-Instruct 29.1% Imported 2026-05-27
18 WizardCoder-15B-V1.0 28% Imported 2026-05-27
19 CodeLlama-13b-Instruct 27.7% Imported 2026-05-27
20 CodeLlama-7b-Instruct 24.6% Imported 2026-05-27
21 OCTOCODER 23.3% Imported 2026-05-27
22 Codeshell-7b-chat 23% Imported 2026-05-27