MultiPL-E

MultiPL-E: Measures model capability on programming, code generation, code repair, or repository-level software tasks.

13rows
scoreprimary metric
2026-05-27sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rows are imported from the public ZeroEval/LLM-Stats MultiPL-E benchmark details JSON endpoint. Source verification and self-report metadata are preserved.

Rank Subject Score Model Match Provenance Sampled
1 Qwen3-235B-A22B-Instruct-2507 0.879 Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Self-reported 2026-05-27
2 Qwen3-Next-80B-A3B-Instruct 0.878 Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Self-reported 2026-05-27
3 Qwen3 VL 235B A22B Instruct 0.861 Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Self-reported 2026-05-27
4 Kimi K2 Instruct 0.857 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Self-reported 2026-05-27
4 Kimi K2-Instruct-0905 0.857 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Self-reported 2026-05-27
6 Qwen2.5 32B Instruct 0.754 Self-reported 2026-05-27
7 Qwen2.5 72B Instruct 0.751 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Self-reported 2026-05-27
8 Qwen2.5 14B Instruct 0.728 Self-reported 2026-05-27
9 Qwen2.5 7B Instruct 0.704 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Self-reported 2026-05-27
10 Qwen2 72B Instruct 0.692 Self-reported 2026-05-27
11 Qwen3 235B A22B 0.6594 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Self-reported 2026-05-27
12 Qwen2.5-Omni-7B 0.658 Self-reported 2026-05-27
13 Qwen2 7B Instruct 0.591 Self-reported 2026-05-27