OJBench

OJBench is a competition-level code benchmark designed to assess the competitive-level code reasoning abilities of large language models. It comprises 232 programming competition problems from NOI and ICPC, categorized into Easy, Medium, and Hard difficulty levels. The benchmark evaluates models' ability to solve complex competitive programming challenges using Python and C++.

9rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Kimi K2.6 0.61 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-06
2 Kimi K2-Thinking-0905 0.49 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Self-reported 2026-05-06
3 Qwen3.5-27B 0.40 Qwen3.5-27B
qwen-qwen3.5-27b
Self-reported 2026-05-06
4 Qwen3.5-122B-A10B 0.40 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Self-reported 2026-05-06
5 Qwen3.5-35B-A3B 0.36 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Self-reported 2026-05-06
6 Qwen3-235B-A22B-Thinking-2507 0.33 Qwen3 235B A22B Thinking 2507
qwen-qwen3-235b-a22b-thinking-2507
Self-reported 2026-05-06
7 Qwen3-Next-80B-A3B-Thinking 0.30 Qwen3 Next 80B A3B Thinking
qwen-qwen3-next-80b-a3b-thinking
Self-reported 2026-05-06
8 Kimi K2-Instruct-0905 0.27 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Self-reported 2026-05-06
8 Kimi K2 Instruct 0.27 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Self-reported 2026-05-06