MageBench Season 1

Magic: The Gathering benchmark leaderboard for LLMs, reporting Season 1 model ratings, blunder index, games played, win rate, and average API cost across the combined format leaderboard.

35rows
ratingprimary metric
2026-05-28sampled

Metadata

Metrics

Rating, Blunder Index (lower is better), Games Played, Win Rate, Average API Cost (lower is better)

Latest Results

Rows are imported from the MageBench Season 1 combined leaderboard and ranked by rating.

Rank Subject Rating Model Match Provenance Sampled
1 Claude Opus 4.6 (medium) 1747 rating / 16 games Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
2 GPT-5.2 (medium) 1737 rating / 11 games GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
3 Gemini 3 Pro (medium) (retired) 1722 rating / 11 games Gemini 3
google-gemini-3
Imported 2026-05-28
4 GPT-5.3 Codex (medium) 1717 rating / 10 games GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-28
5 DeepSeek V3.2 1682 rating / 10 games DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-28
6 GLM 4.7 (medium) 1675 rating / 10 games GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-28
7 GPT-5.4 (medium) 1658 rating / 8 games GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
8 Kimi K2.5 (medium) 1652 rating / 10 games KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
9 Grok 4.1 Fast (medium) 1637 rating / 11 games GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
10 Claude Haiku 4.5 (low) 1637 rating / 10 games Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
11 Gemini 3 Flash (medium) 1622 rating / 10 games Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
12 Grok 4 Fast (medium) 1620 rating / 10 games GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
13 o3 (medium) 1609 rating / 13 games o3
openai-o3
Imported 2026-05-28
14 MiniMax M2.5 (medium) 1606 rating / 8 games MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-28
15 Gemini 3.1 Pro (medium) 1602 rating / 10 games Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
16 Qwen3 Coder 480B 1601 rating / 11 games Imported 2026-05-28
17 Qwen3 Max Thinking (low) 1594 rating / 12 games Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-28
18 Qwen3 235B 1594 rating / 11 games Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-28
19 Llama 4 Maverick 1590 rating / 11 games Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-28
20 Claude Sonnet 4.5 (medium) 1589 rating / 10 games Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
21 MiMo V2 Flash (medium) 1586 rating / 11 games MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-28
22 MiniMax M2.1 (medium) 1584 rating / 9 games MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-28
23 Gemini 3.1 Flash Lite 1578 rating / 8 games Imported 2026-05-28
24 Gemini 2.5 Flash (medium) (retired) 1572 rating / 4 games Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
25 Mistral Medium 3.1 1569 rating / 9 games Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-28
26 Kimi K2 0905 (medium) (retired) 1558 rating / 5 games KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-28
27 GPT-5.2 (retired) 1547 rating / 13 games GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
28 GPT-4o-mini (retired) 1546 rating / 4 games GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-28
29 Gemini 2.5 Pro (medium) (retired) 1540 rating / 9 games Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
30 GPT-5 (medium) 1536 rating / 9 games GPT-5
openai-gpt-5
Imported 2026-05-28
31 GPT-OSS 120B (medium) 1516 rating / 9 games gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-28
32 GPT-5 Mini (medium) (retired) 1516 rating / 8 games GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
33 Mistral Large 1501 rating / 10 games Mistral Large
mistralai-mistral-large
Imported 2026-05-28
34 GPT-5 Nano (low) (retired) 1499 rating / 14 games GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-28
35 Grok 4 (medium) 1459 rating / 13 games GROK Grok 4
x-ai-grok-4
Imported 2026-05-28