OJBench
OJBench is a competition-level code benchmark designed to assess the competitive-level code reasoning abilities of large language models. It comprises 232 programming competition problems from NOI and ICPC, categorized into Easy, Medium, and Hard difficulty levels. The benchmark evaluates models' ability to solve complex competitive programming challenges using Python and C++.
9rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Kimi K2.6 | 0.61 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-06 |
| 2 | Kimi K2-Thinking-0905 | 0.49 | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Self-reported | 2026-05-06 |
| 3 | Qwen3.5-27B | 0.40 | Qwen3.5-27B qwen-qwen3.5-27b | Self-reported | 2026-05-06 |
| 4 | Qwen3.5-122B-A10B | 0.40 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Self-reported | 2026-05-06 |
| 5 | Qwen3.5-35B-A3B | 0.36 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Self-reported | 2026-05-06 |
| 6 | Qwen3-235B-A22B-Thinking-2507 | 0.33 | Qwen3 235B A22B Thinking 2507 qwen-qwen3-235b-a22b-thinking-2507 | Self-reported | 2026-05-06 |
| 7 | Qwen3-Next-80B-A3B-Thinking | 0.30 | Qwen3 Next 80B A3B Thinking qwen-qwen3-next-80b-a3b-thinking | Self-reported | 2026-05-06 |
| 8 | Kimi K2-Instruct-0905 | 0.27 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Self-reported | 2026-05-06 |
| 8 | Kimi K2 Instruct | 0.27 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Self-reported | 2026-05-06 |
No matching rows.