Cybench

Cybench: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.

1rows
accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Accuracy, Total Subtasks, Correct Subtasks, Total Tokens (lower is better), Total Time (lower is better)

Latest Results

Single checked-in analytics row imported for the public Cybench sample benchmark run using openai/gpt-4-turbo-2024-04-09.

Rank Subject Accuracy Model Match Provenance Sampled
1 openai/gpt-4-turbo-2024-04-09 36% Imported 2026-05-27