Graphwalks BFS 1M F1 | BenchmarkList

Metadata

Showing 2 latest source slices.

Rank	Subject	F1	Model Match	Provenance	Sampled
1	Claude Opus 4.8	68.1%	Claude Opus 4.8 anthropic-claude-opus-4.8	Self-reported	2026-05-28
2	GPT-5.5	45.4%	GPT-5.5 openai-gpt-5.5	Self-reported	2026-05-28
3	Claude Opus 4.7	40.3%	Claude Opus 4.7 anthropic-claude-opus-4.7	Self-reported	2026-05-28
4	Claude Opus 4.6	16.3%	Claude Opus 4.6 anthropic-claude-opus-4.6	Self-reported	2026-05-28
1	GPT-5.5	45.4%	GPT-5.5 openai-gpt-5.5	Launch post	2026-04-23
2	Claude Opus 4.6	41.2%	Claude Opus 4.6 anthropic-claude-opus-4.6	Launch post	2026-04-23
3	GPT-5.4	9.4%	GPT-5.4 openai-gpt-5.4	Launch post	2026-04-23