Cybench
Cybench: Measures model robustness, truthfulness, calibration, bias, harmfulness, jailbreak resistance, or alignment-relevant behavior.
1rows
accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Accuracy, Total Subtasks, Correct Subtasks, Total Tokens (lower is better), Total Time (lower is better)
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | openai/gpt-4-turbo-2024-04-09 | 36% | — | Imported | 2026-05-27 |
No matching rows.