NL2Repo
NL2Repo evaluates long-horizon coding capabilities including repository-level understanding, where models must generate or modify code across entire repositories from natural language specifications.
11rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Normalized Score
Showing 2 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 Max | 47.6% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 2 | Qwen3.7 Max | 47.2% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 3 | Kimi K2.6 Thinking | 42.8% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 4 | GLM-5.1 Thinking | 41% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 5 | DeepSeek V4 Pro Max | 35.5% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 6 | Qwen3.6 Plus | 34.4% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 1 | GLM-5.1 | 0.43 | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-06 |
| 2 | MiniMax M2.7 | 0.40 | MiniMax M2.7 minimax-minimax-m2.7 | Self-reported | 2026-05-06 |
| 3 | Qwen3.6 Plus | 0.38 | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-06 |
| 4 | Qwen3.6-27B | 0.36 | Qwen3.6 27B qwen-qwen3.6-27b | Self-reported | 2026-05-06 |
| 5 | Qwen3.6-35B-A3B | 0.29 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Self-reported | 2026-05-06 |
No matching rows.