X-Risks Leaderboard
Extreme-risks evaluation leaderboard for frontier models, using 3C3H scoring over biology, chemistry, and cybersecurity risk-domain questions.
10rows
3c3h_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
3C3H Score, Correctness, Completeness, Conciseness, Helpfulness, Honesty, Harmlessness, Biology, Chemistry, Cybersecurity
| Rank | Subject | 3C3H Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o1-2024-12-17 | 29.09 | o1 openai-o1 | Imported | 2026-05-06 |
| 2 | o3-mini-2025-01-31 | 27.73 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 3 | o1-mini-2024-09-12 | 20.78 | — | Imported | 2026-05-06 |
| 4 | gpt-4o-2024-08-06 | 18.92 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-06 |
| 5 | meta-llama/Llama-3.3-70B-Instruct | 17.83 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 6 | Qwen/Qwen2.5-72B-Instruct | 16.60 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-06 |
| 7 | gpt-4o-mini-2024-07-18 | 15.51 | GPT-4o-mini (2024-07-18) openai-gpt-4o-mini-2024-07-18 | Imported | 2026-05-06 |
| 8 | claude-3-5-sonnet-20241022 | 14.45 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 9 | claude-3-haiku-20240307 | 13.06 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 10 | Qwen/QwQ-32B-Preview | 11.03 | — | Imported | 2026-05-06 |
No matching rows.