BioMysteryBench Human-Solvable

Metadata

Accuracy

Showing 2 latest source slices.

Rank	Subject	Accuracy	Model Match	Provenance	Sampled
1	Claude Mythos Preview	82.6%	Claude Mythos Preview anthropic-claude-mythos-preview	Self-reported	2026-05-28
2	Claude Opus 4.8	80.4%	Claude Opus 4.8 anthropic-claude-opus-4.8	Self-reported	2026-05-28
3	Claude Opus 4.7	78.9%	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-05-28
4	Claude Sonnet 4.6	71.8%	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Self-reported	2026-05-28
1	Claude Mythos Preview	82.6%	Claude Mythos Preview anthropic-claude-mythos-preview	Self-reported	2026-04-29
2	Claude Opus 4.7	78.9%	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-04-29
3	Claude Opus 4.6	77.4%	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-04-29
4	Claude Sonnet 4.6	71.8%	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Self-reported	2026-04-29
5	Claude Haiku 4.5	36.8%	Claude Haiku 4.5 anthropic-claude-haiku-4.5	Self-reported	2026-04-29