Cybersecurity CTFs
Cybersecurity Capture the Flag (CTF) benchmark for evaluating LLMs in offensive security challenges. Contains diverse cybersecurity tasks including cryptography, web exploitation, binary analysis, and forensics to assess AI capabilities in cybersecurity problem-solving.
3rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.3 Codex | 0.78 | GPT-5.3-Codex openai-gpt-5.3-codex | Self-reported | 2026-05-06 |
| 2 | Claude Haiku 4.5 | 0.47 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Self-reported | 2026-05-06 |
| 3 | o1-mini | 0.29 | — | Self-reported | 2026-05-06 |
No matching rows.