InfiniteBM Chess
Head-to-head LLM game-arena ladder for chess, using InfiniteBM's per-game Bradley-Terry Elo ratings across model and human matches.
6rows
arena_eloprimary metric
2026-05-28sampled
Metadata
Metrics
Arena Elo, Rating Confidence Half-Width (lower is better), Games Played, Win Rate, Better Than Humans, Better Than Models
| Rank | Subject | Arena Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | 1997.52 Elo / 16 games | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 2 | GPT-OSS 120B | 1660.89 Elo / 6 games | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 3 | Claude Sonnet 4.6 | 1190.33 Elo / 11 games | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 4 | Claude Haiku 4.5 | 936.92 Elo / 12 games | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 5 | GPT-5.4 Mini | 765.37 Elo / 8 games | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 6 | GPT-5.4 | 334.92 Elo / 7 games | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
No matching rows.