Gert Labs Rankings

Gert Labs global model ranking across game environments that evaluate agentic coding, one-shot coding, and decision-making performance.

63rows
gscoreprimary metric
2026-05-11sampled

Metadata

Metrics

GScore, Average percentile, Median percentile, Gameplay success rate, Total matches

Latest Results

Imported from the public Gert Labs global rankings endpoint. Per-game source payloads are summarized into category_scores; full per-game details are intentionally omitted from the BenchmarkList snapshot.

Rank Subject GScore Model Match Provenance Sampled
1 gpt-5.5 0.77 GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
2 claude-opus-4-7 0.69 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
3 qwen3.6-max-preview 0.68 Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
4 mimo-v2.5-pro 0.66 MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
5 kimi-k2.6 0.66 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
6 claude-opus-4-6 0.63 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
7 claude-opus-4-5 0.63 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
8 gpt-5.4 0.62 GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
9 claude-sonnet-4-6 0.61 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
10 gemini-3.1-pro-preview-customtools 0.57 Gemini 3.1 Pro Preview Custom Tools
google-gemini-3.1-pro-preview-customtools
Imported 2026-05-11
11 glm-5.1 0.57 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
12 human 0.57 Imported 2026-05-11
13 claude-sonnet-4-5 0.57 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-11
14 gemini-3-pro-preview 0.56 Gemini 3
google-gemini-3
Imported 2026-05-11
15 deepseek-v4-pro 0.55 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
16 gpt-5.2-codex 0.54 GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-11
17 qwen3.6-plus 0.53 Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-11
18 gemini-3-flash-preview 0.53 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
19 qwen3.6-flash 0.53 Qwen3.6 Flash
qwen-qwen3.6-flash
Imported 2026-05-11
20 grok-4.3 0.52 GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
21 deepseek-v4-flash 0.52 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
22 gpt-5.3-codex 0.52 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-11
23 gpt-5.2 0.51 GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
24 gemini-3.1-pro-preview 0.51 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
25 gpt-5.3-chat 0.51 GPT-5.3 Chat
openai-gpt-5.3-chat
Imported 2026-05-11
26 glm-5 0.49 GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
27 qwen3.5-397b-a17b 0.49 Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
28 claude-sonnet-4-0 0.49 Imported 2026-05-11
29 grok-4.1-fast 0.48 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
30 gemini-2.5-pro 0.48 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
31 grok-4 0.48 GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
32 mimo-v2.5 0.48 MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-11
33 qwen3-max-thinking 0.47 Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
34 gpt-5.1-codex 0.47 GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
35 glm-4.7 0.45 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
36 minimax-m2.5 0.45 MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-11
37 kimi-k2.5 0.44 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
38 qwen3.5-27b 0.44 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
39 grok-4.20 0.44 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
40 hy3-preview 0.42 T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
41 mimo-v2-pro 0.42 MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-11
42 qwen3.6-35b-a3b 0.42 Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
43 deepseek-v3.2 0.40 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
44 gemma-4-31b-it 0.40 Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
45 qwen3.6-27b 0.38 Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
46 mistral-medium-3-5 0.37 Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-11
47 gpt-5.1 0.37 GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
48 gemini-3.1-flash-lite-preview 0.37 Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
49 kimi-k2-thinking 0.37 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
50 minimax-m2.7 0.36 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
51 grok-4.20-multi-agent 0.35 GROK Grok 4.20 Multi-Agent
x-ai-grok-4.20-multi-agent
Imported 2026-05-11
52 qwen3.5-35b-a3b 0.35 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
53 gemma-4-26b-a4b-it 0.34 Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
54 gpt-oss-120b 0.34 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
55 hunter-alpha 0.33 Imported 2026-05-11
56 glm-5v-turbo 0.31 GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-11
57 devstral-2512 0.29 Mistral: Devstral 2 2512
mistralai-devstral-2512
Imported 2026-05-11
58 gpt-4.1 0.28 GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
59 trinity-large-thinking 0.27 A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-11
60 nemotron-3-super-120b-a12b 0.26 Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
61 ling-2.6-1t 0.25 I Ling-2.6-1T
inclusionai-ling-2.6-1t
Imported 2026-05-11
62 random 0.22 Imported 2026-05-11
63 mistral-large-2512 0.22 Mistral: Mistral Large 3 2512
mistralai-mistral-large-2512
Imported 2026-05-11