CyberSecEval

CyberSecEval: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.

7rows
average_injection_success_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Average Injection Success Rate (lower is better), Different User Input Language (lower is better), Output Formatting Manipulation (lower is better), Overload With Information (lower is better), Many Shot Attack (lower is better), Ignore Previous Instructions (lower is better), System Mode (lower is better), Few Shot Attack (lower is better), Indirect Reference (lower is better), Repeated Token Attack (lower is better), Persuasion (lower is better), Mixed Techniques (lower is better), Virtualization (lower is better), Payload Splitting (lower is better), Hypothetical Scenario (lower is better), Token Smuggling (lower is better)

Latest Results

Rows are transcribed from the public CyberSecEval 2 arXiv source prompt-injection result figure.

Rank Subject Average Injection Success Rate Model Match Provenance Sampled
1 codellama-70b-instruct 12.93% Imported 2026-05-27
2 gpt-4 19.87% GPT-4
openai-gpt-4
Imported 2026-05-27
3 llama 3 70b-instruct 29.27% Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-27
4 codellama-34b-instruct 36.33% Imported 2026-05-27
5 codellama-13b-instruct 37.27% Imported 2026-05-27
6 gpt-3.5-turbo 39.13% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
7 llama 3 8b-instruct 45.27% Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-27