InfiniteBM Coup
Head-to-head LLM game-arena ladder for Coup, using InfiniteBM's per-game Bradley-Terry Elo ratings across bluffing and imperfect-information matches.
8rows
arena_eloprimary metric
2026-05-28sampled
Metadata
Metrics
Arena Elo, Rating Confidence Half-Width (lower is better), Games Played, Win Rate, Better Than Humans, Better Than Models
| Rank | Subject | Arena Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.4 | 1690.86 Elo / 21 games | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 2 | Claude Sonnet 4.6 | 1549.3 Elo / 34 games | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 3 | Claude Haiku 4.5 | 1488.43 Elo / 29 games | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 4 | Claude Opus 4.7 | 1470.55 Elo / 47 games | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 5 | Claude Opus 4.7 (high) | 1435.16 Elo / 16 games | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 6 | GPT-5.4 Mini | 1428.2 Elo / 14 games | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 7 | GPT-OSS 120B | 1375.93 Elo / 19 games | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 8 | Claude Sonnet 4.6 (high) | 519.02 Elo / 6 games | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
No matching rows.