FigQA | BenchmarkList

Metadata

Score, Normalized Score

Showing 2 latest source slices.

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Claude Opus 4.8	87.3%	Claude Opus 4.8 anthropic-claude-opus-4.8	Self-reported	2026-05-28
2	Claude Opus 4.7	85.4%	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-05-28
1	Claude Mythos Preview	0.89	Claude Mythos Preview anthropic-claude-mythos-preview	Self-reported	2026-05-06
2	Claude Opus 4.6	0.78	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-05-06
3	Grok-4.1 Thinking	0.34	—	Self-reported	2026-05-06