KernelBench Hard

KernelBench Hard evaluates autonomous coding agents on GPU kernel engineering tasks, measuring correctness and speed relative to hardware baselines.

12rows
pass_rateprimary metric
2026-05-06sampled

Metadata

Metrics

Pass Rate, Correct Count, Solution Count, Total Problems, Average Peak Fraction, Average Elapsed Seconds (lower is better)

Latest Results

Rows are ranked by pass_rate, then average_peak_fraction. Source labels are preserved and per-problem run IDs/results are retained in metadata.

Rank Subject Pass Rate Model Match Provenance Sampled
1 codex/gpt-5.5 [xhigh] 100 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
2 claude/claude-opus-4-7 [max] 85.71 Imported 2026-05-06
3 kimi/kimi-k2.6 85.71 Imported 2026-05-06
4 opencode/openrouter-pinned/xiaomi/mimo-v2.5-pro 71.43 Imported 2026-05-06
5 opencode/openrouter-pinned/qwen/qwen3.6-max-preview 71.43 Imported 2026-05-06
6 opencode/deepseek/deepseek-v4-flash 71.43 Imported 2026-05-06
7 opencode/deepseek/deepseek-v4-pro 71.43 Imported 2026-05-06
8 opencode/openrouter-pinned/qwen/qwen3.6-plus 57.14 Imported 2026-05-06
9 opencode/zai/glm-5.1 57.14 Imported 2026-05-06
10 opencode/openrouter-pinned/minimax/minimax-m2.7 42.86 Imported 2026-05-06
11 opencode/openrouter-pinned/qwen/qwen3.6-27b 14.29 Imported 2026-05-06
12 opencode/openrouter-pinned/qwen/qwen3.6-35b-a3b 0 Imported 2026-05-06