FrontierSWE
Software-engineering agent benchmark targeting frontier-level implementation, performance optimization, and research tasks.
13rows
avg_rankprimary metric
2026-05-28sampled
Metadata
Metrics
Average Rank (lower is better), Dominance, Implementation Average Rank (lower is better), Performance Average Rank (lower is better), Research Average Rank (lower is better)
Showing 2 latest source slices.
| Rank | Subject | Average Rank | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 2.7 avg rank | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 4.2 avg rank | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 3 | Claude Opus 4.6 | 4.9 avg rank | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 1 | GPT-5.5 (Codex) | 2.53 avg rank / 83% dominance | — | Imported | 2026-05-28 |
| 2 | Claude Opus 4.7 (Claude Code) | 3.56 avg rank / 72% dominance | — | Imported | 2026-05-28 |
| 3 | Claude Opus 4.6 (Claude Code) | 4.18 avg rank / 65% dominance | — | Imported | 2026-05-28 |
| 4 | GPT-5.4 (Codex) | 4.29 avg rank / 63% dominance | — | Imported | 2026-05-28 |
| 5 | Composer 2.5 (Cursor CLI) | 5.71 avg rank / 48% dominance | — | Imported | 2026-05-28 |
| 6 | Gemini 3.1 Pro (Gemini CLI) | 5.79 avg rank / 47% dominance | — | Imported | 2026-05-28 |
| 7 | DeepSeek V4 Pro (Claude Code) | 6.76 avg rank / 36% dominance | — | Imported | 2026-05-28 |
| 8 | Kimi K2.6 (Kimi CLI) | 7.12 avg rank / 32% dominance | — | Imported | 2026-05-28 |
| 9 | Kimi K2.5 (Kimi CLI) | 7.41 avg rank / 29% dominance | — | Imported | 2026-05-28 |
| 10 | Qwen3.6-Plus (Qwen Code) | 7.65 avg rank / 26% dominance | — | Imported | 2026-05-28 |
No matching rows.