MageBench Season 1
Magic: The Gathering benchmark leaderboard for LLMs, reporting Season 1 model ratings, blunder index, games played, win rate, and average API cost across the combined format leaderboard.
35rows
ratingprimary metric
2026-05-28sampled
Metadata
Metrics
Rating, Blunder Index (lower is better), Games Played, Win Rate, Average API Cost (lower is better)
| Rank | Subject | Rating | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 (medium) | 1747 rating / 16 games | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 2 | GPT-5.2 (medium) | 1737 rating / 11 games | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 3 | Gemini 3 Pro (medium) (retired) | 1722 rating / 11 games | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 4 | GPT-5.3 Codex (medium) | 1717 rating / 10 games | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-28 |
| 5 | DeepSeek V3.2 | 1682 rating / 10 games | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-28 |
| 6 | GLM 4.7 (medium) | 1675 rating / 10 games | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-28 |
| 7 | GPT-5.4 (medium) | 1658 rating / 8 games | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-28 |
| 8 | Kimi K2.5 (medium) | 1652 rating / 10 games | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-28 |
| 9 | Grok 4.1 Fast (medium) | 1637 rating / 11 games | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-28 |
| 10 | Claude Haiku 4.5 (low) | 1637 rating / 10 games | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-28 |
| 11 | Gemini 3 Flash (medium) | 1622 rating / 10 games | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 12 | Grok 4 Fast (medium) | 1620 rating / 10 games | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 13 | o3 (medium) | 1609 rating / 13 games | o3 openai-o3 | Imported | 2026-05-28 |
| 14 | MiniMax M2.5 (medium) | 1606 rating / 8 games | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-28 |
| 15 | Gemini 3.1 Pro (medium) | 1602 rating / 10 games | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 16 | Qwen3 Coder 480B | 1601 rating / 11 games | — | Imported | 2026-05-28 |
| 17 | Qwen3 Max Thinking (low) | 1594 rating / 12 games | Qwen3 Max Thinking qwen-qwen3-max-thinking | Imported | 2026-05-28 |
| 18 | Qwen3 235B | 1594 rating / 11 games | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-28 |
| 19 | Llama 4 Maverick | 1590 rating / 11 games | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-28 |
| 20 | Claude Sonnet 4.5 (medium) | 1589 rating / 10 games | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 21 | MiMo V2 Flash (medium) | 1586 rating / 11 games | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-28 |
| 22 | MiniMax M2.1 (medium) | 1584 rating / 9 games | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-28 |
| 23 | Gemini 3.1 Flash Lite | 1578 rating / 8 games | — | Imported | 2026-05-28 |
| 24 | Gemini 2.5 Flash (medium) (retired) | 1572 rating / 4 games | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 25 | Mistral Medium 3.1 | 1569 rating / 9 games | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-28 |
| 26 | Kimi K2 0905 (medium) (retired) | 1558 rating / 5 games | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-28 |
| 27 | GPT-5.2 (retired) | 1547 rating / 13 games | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-28 |
| 28 | GPT-4o-mini (retired) | 1546 rating / 4 games | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-28 |
| 29 | Gemini 2.5 Pro (medium) (retired) | 1540 rating / 9 games | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 30 | GPT-5 (medium) | 1536 rating / 9 games | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 31 | GPT-OSS 120B (medium) | 1516 rating / 9 games | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 32 | GPT-5 Mini (medium) (retired) | 1516 rating / 8 games | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 33 | Mistral Large | 1501 rating / 10 games | Mistral Large mistralai-mistral-large | Imported | 2026-05-28 |
| 34 | GPT-5 Nano (low) (retired) | 1499 rating / 14 games | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 35 | Grok 4 (medium) | 1459 rating / 13 games | Grok 4 x-ai-grok-4 | Imported | 2026-05-28 |
No matching rows.