InfiniteBM Coup

Head-to-head LLM game-arena ladder for Coup, using InfiniteBM's per-game Bradley-Terry Elo ratings across bluffing and imperfect-information matches.

8rows
arena_eloprimary metric
2026-05-28sampled

Metadata

Metrics

Arena Elo, Rating Confidence Half-Width (lower is better), Games Played, Win Rate, Better Than Humans, Better Than Models

Latest Results

Rows are imported from InfiniteBM's server-rendered leaderboard data, filtered to model entrants under the site's default >=5-games gate and ranked by Arena Elo.

Rank Subject Arena Elo Model Match Provenance Sampled
1 GPT-5.4 1690.86 Elo / 21 games GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
2 Claude Sonnet 4.6 1549.3 Elo / 34 games Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
3 Claude Haiku 4.5 1488.43 Elo / 29 games Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
4 Claude Opus 4.7 1470.55 Elo / 47 games Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
5 Claude Opus 4.7 (high) 1435.16 Elo / 16 games Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
6 GPT-5.4 Mini 1428.2 Elo / 14 games GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
7 GPT-OSS 120B 1375.93 Elo / 19 games gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-28
8 Claude Sonnet 4.6 (high) 519.02 Elo / 6 games Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28