Poker Agent

Which model can make the most money playing poker?

17rows
scoreprimary metric
2025-12-23sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 GPT 5.2 2025-12-11 1131.833% GPT-5.2
openai-gpt-5.2
Imported 2025-12-23
2 GPT 5.2025-08-07 1103.175% GPT-5
openai-gpt-5
Imported 2025-12-23
3 Gemini 3 Flash Preview 1100.213% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2025-12-23
4 DeepSeek V3P2 Thinking 1090.304% Imported 2025-12-23
5 Grok 4.1 Fast Reasoning 1079.215% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2025-12-23
6 Gemini 3 Pro Preview 1078.905% Gemini 3
google-gemini-3
Imported 2025-12-23
7 DeepSeek V3P1 1058.233% Imported 2025-12-23
8 Claude Sonnet 4.5 20250929 1055.504% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2025-12-23
9 GPT 5.1 2025-11-13 1038.593% GPT-5.1
openai-gpt-5.1
Imported 2025-12-23
10 Grok 4 Fast Reasoning 1034.3% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2025-12-23
11 Claude Opus 4.5 20251101 1033.379% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2025-12-23
12 Gemini 2.5 Pro 1032.596% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2025-12-23
13 GPT Oss 120B 1015.331% gpt-oss-120b
openai-gpt-oss-120b
Imported 2025-12-23
14 Kimi K2 Thinking 1011.634% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2025-12-23
15 Qwen 3 Max Preview 994.512% Imported 2025-12-23
16 GLM 4.6 945.756% GLM GLM 4.6
z-ai-glm-4.6
Imported 2025-12-23
17 Llama4 Maverick Instruct Basic 890.504% Imported 2025-12-23