Natural2Code

NaturalCodeBench (NCB) is a challenging code benchmark designed to mirror the complexity and variety of real-world coding tasks. It comprises 402 high-quality problems in Python and Java, selected from natural user queries from online coding services, covering 6 different domains.

8rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Gemini 2.0 Flash 0.93 Gemini 2.0 Flash
google-gemini-2.0-flash
Self-reported 2026-05-06
2 Gemini 1.5 Pro 0.85 Self-reported 2026-05-06
3 Gemma 3 27B 0.84 Gemma 3 27B
google-gemma-3-27b-it
Self-reported 2026-05-06
4 Gemma 3 12B 0.81 Gemma 3 12B
google-gemma-3-12b-it
Self-reported 2026-05-06
5 Gemini 1.5 Flash 0.80 Self-reported 2026-05-06
6 Gemini 1.5 Flash 8B 0.76 Self-reported 2026-05-06
7 Gemma 3 4B 0.70 Gemma 3 4B
google-gemma-3-4b-it
Self-reported 2026-05-06
8 Gemma 3 1B 0.56 Self-reported 2026-05-06