Cybersecurity CTFs

Cybersecurity Capture the Flag (CTF) benchmark for evaluating LLMs in offensive security challenges. Contains diverse cybersecurity tasks including cryptography, web exploitation, binary analysis, and forensics to assess AI capabilities in cybersecurity problem-solving.

3rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.3 Codex 0.78 GPT-5.3-Codex
openai-gpt-5.3-codex
Self-reported 2026-05-06
2 Claude Haiku 4.5 0.47 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Self-reported 2026-05-06
3 o1-mini 0.29 Self-reported 2026-05-06