Xent Games
Interactive game-agent leaderboard where LLMs play games in LLM-generated contexts with LLM-enforced rules and LLM-scored outcomes.
12rows
overall_scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Overall Score, Condense, Contrast, Two-Ways, Synthesize, Horizon
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gemini-2.5-pro | 65.86 overall | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 2 | grok-4-0709 | 63.22 overall | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
| 3 | gpt-5 | 62.77 overall | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 4 | deepseek-reasoner | 62.67 overall | R1 deepseek-r1 | Imported | 2026-05-28 |
| 5 | gemini-2.5-flash | 59.08 overall | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 6 | claude-opus-4-1-20250805 | 58.65 overall | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-28 |
| 7 | claude-opus-4-20250514 | 58.35 overall | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-28 |
| 8 | gpt-5-mini | 49.22 overall | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 9 | claude-sonnet-4-20250514 | 48.45 overall | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 10 | kimi-k2-0905-preview | 42.89 overall | — | Imported | 2026-05-28 |
| 11 | deepseek-chat | 35.48 overall | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-28 |
| 12 | gpt-5-nano | 23.27 overall | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
No matching rows.