CyberGym | BenchmarkList

Metadata

Score, Normalized Score

Showing 4 latest source slices.

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Claude Mythos Preview	83.1%	Claude Mythos Preview anthropic-claude-mythos-preview	Self-reported	2026-05-28
2	Claude Opus 4.8	78.8%	Claude Opus 4.8 anthropic-claude-opus-4.8	Self-reported	2026-05-28
3	Claude Opus 4.7	73.1%	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-05-28
4	Claude Sonnet 4.6	65.2%	Claude Sonnet 4.6 anthropic-claude-sonnet-4.6	Self-reported	2026-05-28
1	Claude Mythos Preview	0.83	Claude Mythos Preview anthropic-claude-mythos-preview	Self-reported	2026-05-06
2	GPT-5.5	0.82	GPT-5.5 openai-gpt-5.5	Self-reported	2026-05-06
3	Claude Opus 4.6	0.74	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-05-06
4	Claude Opus 4.7	0.73	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-05-06
5	GLM-5.1	0.69	GLM GLM 5.1 z-ai-glm-5.1	Self-reported	2026-05-06
6	Kimi K2.5	0.41	KIMI MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5	Self-reported	2026-05-06
1	GPT-5.5	81.8%	GPT-5.5 openai-gpt-5.5	Launch post	2026-04-23
2	GPT-5.4	79%	GPT-5.4 openai-gpt-5.4	Launch post	2026-04-23
3	Claude Opus 4.7	73.1%	Claude Opus 4.7 anthropic-claude-opus-4.7	Launch post	2026-04-23
1	Claude Mythos Preview	83.1%	Claude Mythos Preview anthropic-claude-mythos-preview	Launch post	2026-04-16
2	Claude Opus 4.6	73.8%	Claude Opus 4.6 anthropic-claude-opus-4.6	Launch post	2026-04-16
3	Claude Opus 4.7	73.1%	Claude Opus 4.7 anthropic-claude-opus-4.7	Launch post	2026-04-16
4	GPT-5.4	66.3%	GPT-5.4 openai-gpt-5.4	Launch post	2026-04-16