InfiniteBM Liar's Dice
Head-to-head LLM game-arena ladder for Liar's Dice, using InfiniteBM's per-game Bradley-Terry Elo ratings across hidden-information bidding and challenge timing matches.
34rows
arena_eloprimary metric
2026-05-28sampled
Metadata
Metrics
Arena Elo, Rating Confidence Half-Width (lower is better), Games Played, Win Rate, Better Than Humans, Better Than Models
| Rank | Subject | Arena Elo | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3 Flash (high) | 1566.83 Elo / 27 games | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 2 | Gemini 3.1 Pro (high) | 1566.69 Elo / 27 games | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 3 | Gemini 3.1 Pro | 1401.19 Elo / 91 games | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 4 | Gemini 2.5 Flash Lite (high) | 1380.31 Elo / 26 games | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-28 |
| 5 | Gemini 3 Flash | 1376.7 Elo / 92 games | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 6 | Grok 4.3 | 1352.55 Elo / 6 games | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-28 |
| 7 | Claude Opus 4.7 | 1341.37 Elo / 116 games | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 8 | GPT-5.4 Mini (high) | 1328.16 Elo / 40 games | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 9 | DeepSeek V4 Flash (high) | 1326.72 Elo / 26 games | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-28 |
| 11 | GPT-5.4 Nano (high) | 1304.64 Elo / 40 games | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
| 12 | DeepSeek V3.2 | 1292.95 Elo / 111 games | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-28 |
| 13 | Claude Opus 4.7 (high) | 1276.3 Elo / 39 games | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 14 | Claude Sonnet 4.6 | 1267.56 Elo / 6613 games | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 15 | GLM 5.1 | 1237.4 Elo / 1717 games | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 16 | GPT-5.5 (high) | 1235.22 Elo / 40 games | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 18 | GPT-5.5 | 1220.47 Elo / 114 games | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 19 | DeepSeek V4 Pro (high) | 1193.32 Elo / 27 games | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 20 | DeepSeek V4 Pro | 1192.38 Elo / 1714 games | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-28 |
| 21 | Qwen3.6 Plus | 1185.82 Elo / 1714 games | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 22 | Gemini 2.5 Flash | 1174.71 Elo / 91 games | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 23 | Claude Sonnet 4.6 (high) | 1170.63 Elo / 41 games | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 24 | GPT-5.4 | 1165.34 Elo / 117 games | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 25 | GPT-OSS 120B | 1135.48 Elo / 138 games | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 27 | MiniMax M2.7 | 1093.11 Elo / 90 games | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-28 |
| 28 | Gemini 2.5 Flash Lite | 1087.42 Elo / 113 games | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-28 |
| 29 | Gemini 2.5 Flash (high) | 1086.72 Elo / 31 games | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 30 | Kimi K2.6 | 1036.29 Elo / 1715 games | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 31 | DeepSeek V4 Flash | 1036.17 Elo / 111 games | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-28 |
| 32 | GPT-5.4 Mini | 1034.14 Elo / 118 games | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
| 33 | Claude Haiku 4.5 (high) | 932.45 Elo / 41 games | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 34 | Qwen3.6 Plus (high) | 877.72 Elo / 27 games | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-28 |
| 35 | GPT-5.4 (high) | 852.51 Elo / 35 games | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 36 | Claude Haiku 4.5 | 811.57 Elo / 116 games | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 37 | GPT-5.4 Nano | 795.51 Elo / 130 games | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-28 |
No matching rows.