KernelBench Hard
KernelBench Hard evaluates autonomous coding agents on GPU kernel engineering tasks, measuring correctness and speed relative to hardware baselines.
12rows
pass_rateprimary metric
2026-05-06sampled
Metadata
Metrics
Pass Rate, Correct Count, Solution Count, Total Problems, Average Peak Fraction, Average Elapsed Seconds (lower is better)
| Rank | Subject | Pass Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | codex/gpt-5.5 [xhigh] | 100 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 2 | claude/claude-opus-4-7 [max] | 85.71 | — | Imported | 2026-05-06 |
| 3 | kimi/kimi-k2.6 | 85.71 | — | Imported | 2026-05-06 |
| 4 | opencode/openrouter-pinned/xiaomi/mimo-v2.5-pro | 71.43 | — | Imported | 2026-05-06 |
| 5 | opencode/openrouter-pinned/qwen/qwen3.6-max-preview | 71.43 | — | Imported | 2026-05-06 |
| 6 | opencode/deepseek/deepseek-v4-flash | 71.43 | — | Imported | 2026-05-06 |
| 7 | opencode/deepseek/deepseek-v4-pro | 71.43 | — | Imported | 2026-05-06 |
| 8 | opencode/openrouter-pinned/qwen/qwen3.6-plus | 57.14 | — | Imported | 2026-05-06 |
| 9 | opencode/zai/glm-5.1 | 57.14 | — | Imported | 2026-05-06 |
| 10 | opencode/openrouter-pinned/minimax/minimax-m2.7 | 42.86 | — | Imported | 2026-05-06 |
| 11 | opencode/openrouter-pinned/qwen/qwen3.6-27b | 14.29 | — | Imported | 2026-05-06 |
| 12 | opencode/openrouter-pinned/qwen/qwen3.6-35b-a3b | 0 | — | Imported | 2026-05-06 |
No matching rows.