X-Risks Leaderboard

Extreme-risks evaluation leaderboard for frontier models, using 3C3H scoring over biology, chemistry, and cybersecurity risk-domain questions.

10rows
3c3h_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

3C3H Score, Correctness, Completeness, Conciseness, Helpfulness, Honesty, Harmlessness, Biology, Chemistry, Cybersecurity

Latest Results

Rows are parsed from the public X-Risks result JSON. Scores are converted to percentages and preserve 3C3H dimensions plus Biology, Chemistry, and Cybersecurity task scores.

Rank Subject 3C3H Score Model Match Provenance Sampled
1 o1-2024-12-17 29.09 o1
openai-o1
Imported 2026-05-06
2 o3-mini-2025-01-31 27.73 o3-mini
openai-o3-mini
Imported 2026-05-06
3 o1-mini-2024-09-12 20.78 Imported 2026-05-06
4 gpt-4o-2024-08-06 18.92 GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-06
5 meta-llama/Llama-3.3-70B-Instruct 17.83 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-06
6 Qwen/Qwen2.5-72B-Instruct 16.60 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-06
7 gpt-4o-mini-2024-07-18 15.51 GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-06
8 claude-3-5-sonnet-20241022 14.45 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
9 claude-3-haiku-20240307 13.06 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-06
10 Qwen/QwQ-32B-Preview 11.03 Imported 2026-05-06