ClassEval

Class-level code generation benchmark evaluating class and function success under multiple prompting strategies.

33rows
pass_1_class_successprimary metric
2026-05-27sampled

Metadata

Metrics

Pass@1 Class Success, Pass@1 Class Partial Success, Pass@1 Function Success, Pass@5 Class Success, Pass@5 Function Success

Latest Results

Rows parsed from ClassEval public result JSON. Model keys preserve the source prompting-strategy suffixes C/H/I.

Rank Subject Pass@1 Class Success Model Match Provenance Sampled
1 GPT-4 (H) 37.6 GPT-4
openai-gpt-4
Imported 2026-05-27
2 GPT-3.5 (H) 29.6 Imported 2026-05-27
3 GPT-4 (C) 29.6 GPT-4
openai-gpt-4
Imported 2026-05-27
4 GPT-4 (I) 26.2 GPT-4
openai-gpt-4
Imported 2026-05-27
5 GPT-3.5 (I) 25.6 Imported 2026-05-27
6 GPT-3.5 (C) 18.2 Imported 2026-05-27
7 WizardCoder (C) 12.2 Imported 2026-05-27
8 Instruct-StarCoder (H) 10.2 Imported 2026-05-27
9 WizardCoder (H) 9.2 Imported 2026-05-27
10 Instruct-StarCoder (C) 9 Imported 2026-05-27
11 SantaCoder (I) 8.6 Imported 2026-05-27
12 Instruct-StarCoder (I) 8.4 Imported 2026-05-27
13 Instruct-CodeGen (I) 8.2 Imported 2026-05-27
14 Instruct-CodeGen (H) 7.4 Imported 2026-05-27
15 CodeGeeX (I) 7.2 Imported 2026-05-27
16 Incoder (I) 6.2 Imported 2026-05-27
17 Instruct-CodeGen (C) 5.8 Imported 2026-05-27
18 WizardCoder (I) 5.4 Imported 2026-05-27
19 CodeGeeX (C) 3.8 Imported 2026-05-27
20 Incoder (C) 3.4 Imported 2026-05-27
21 SantaCoder (C) 3.2 Imported 2026-05-27
22 Vicuna (I) 3 Imported 2026-05-27
23 Incoder (H) 2.6 Imported 2026-05-27
24 PolyCoder (C) 2.6 Imported 2026-05-27
25 Vicuna (C) 2.6 Imported 2026-05-27
26 ChatGLM (C) 1.4 Imported 2026-05-27
27 PolyCoder (I) 1.4 Imported 2026-05-27
28 Vicuna (H) 1.4 Imported 2026-05-27
29 ChatGLM (I) 1.2 Imported 2026-05-27
30 ChatGLM (H) 1 Imported 2026-05-27
31 CodeGeeX (H) 1 Imported 2026-05-27
32 SantaCoder (H) 1 Imported 2026-05-27
33 PolyCoder (H) 0 Imported 2026-05-27