Agent Security League
AI coding agent security benchmark measuring functional correctness and security correctness across 200 real-world tasks spanning 77 CWE classes.
17rows
secure_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Secure, Functional
| Rank | Subject | Secure | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Cursor + GPT-5.5 | 23.50 | — | Imported | 2026-05-06 |
| 2 | Cursor + Claude Opus 4.7 | 22.90 | — | Imported | 2026-05-06 |
| 3 | Claude Code + Claude Opus 4.7 | 20.10 | — | Imported | 2026-05-06 |
| 4 | Codex + GPT-5.5 | 20.10 | — | Imported | 2026-05-06 |
| 5 | Codex + GPT-5.4 | 17.30 | — | Imported | 2026-05-06 |
| 6 | Cursor + Gemini 3.1 Pro | 13.40 | — | Imported | 2026-05-06 |
| 7 | Cursor + GPT-5.3 | 12.80 | — | Imported | 2026-05-06 |
| 8 | Cursor + Claude Opus 4.6 | 7.80 | — | Imported | 2026-05-06 |
| 9 | Cursor + Gemini 3 Pro | 7.30 | — | Imported | 2026-05-06 |
| 10 | Claude Code + Claude Opus 4.5 | 10.10 | — | Imported | 2026-05-06 |
| 11 | Claude Code + Claude Opus 4.6 | 8.40 | — | Imported | 2026-05-06 |
| 12 | Claude Code + Gemini 3 Pro | 8.40 | — | Imported | 2026-05-06 |
| 13 | Claude Code + Claude Sonnet 4.6 | 7.80 | — | Imported | 2026-05-06 |
| 14 | Claude Code + Claude Sonnet 4 | 6.10 | — | Imported | 2026-05-06 |
| 15 | Claude Code + Gemini 2.5 Pro | 5 | — | Imported | 2026-05-06 |
| 16 | SWE-Agent + Claude Sonnet 4 | 7.80 | — | Imported | 2026-05-06 |
| 17 | SWE-Agent + Gemini 2.5 Pro | 4.50 | — | Imported | 2026-05-06 |
No matching rows.