Xent Games

Interactive game-agent leaderboard where LLMs play games in LLM-generated contexts with LLM-enforced rules and LLM-scored outcomes.

12rows
overall_scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Overall Score, Condense, Contrast, Two-Ways, Synthesize, Horizon

Latest Results

Rows are imported from the official Xent Labs leaderboard page data. The site describes the setting as LLM agents in LLM-generated contexts playing games with LLM-enforced rules and LLM-scored results.

Rank Subject Overall Score Model Match Provenance Sampled
1 gemini-2.5-pro 65.86 overall Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
2 grok-4-0709 63.22 overall GROK Grok 4
x-ai-grok-4
Imported 2026-05-28
3 gpt-5 62.77 overall GPT-5
openai-gpt-5
Imported 2026-05-28
4 deepseek-reasoner 62.67 overall R1
deepseek-r1
Imported 2026-05-28
5 gemini-2.5-flash 59.08 overall Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
6 claude-opus-4-1-20250805 58.65 overall Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-28
7 claude-opus-4-20250514 58.35 overall Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-28
8 gpt-5-mini 49.22 overall GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
9 claude-sonnet-4-20250514 48.45 overall Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
10 kimi-k2-0905-preview 42.89 overall Imported 2026-05-28
11 deepseek-chat 35.48 overall DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-28
12 gpt-5-nano 23.27 overall GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-28