Terminal-Bench Hard

Artificial Analysis Terminal-Bench hard subset for terminal-based software engineering, system administration, game-playing, and data-processing tasks.

398rows
scoreprimary metric
2026-05-11sampled

Metadata

Metrics

Success Rate

Latest Results

Rows are parsed from the public Artificial Analysis Next.js RSC defaultData payload and ranked by the configured primary metric.

Rank Subject Success Rate Model Match Provenance Sampled
1 GPT-5.5 (xhigh) 60.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
2 GPT-5.5 (high) 59.8% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
3 GPT-5.4 (xhigh) 57.6% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
4 GPT-5.5 (medium) 57.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
5 Claude Opus 4.7 (Non-reasoning, High Effort) 54.5% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
6 Gemini 3.1 Pro Preview 53.8% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
7 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 53% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
8 GPT-5.3 Codex (xhigh) 53% GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-11
9 GPT-5.4 mini (xhigh) 52.3% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
10 GPT-5.5 (low) 52.3% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
11 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 51.5% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
12 GPT-5.5 (Non-reasoning) 49.2% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
13 KAT Coder Pro V2 49.2% K KAT-Coder-Pro V2
kwaipilot-kat-coder-pro-v2
Imported 2026-05-11
14 Claude Opus 4.6 (Non-reasoning, High Effort) 48.5% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
15 Claude Opus 4.5 (Reasoning) 47% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
16 GPT-5.2 (xhigh) 47% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
17 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 46.2% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
18 Claude Sonnet 4.6 (Non-reasoning, High Effort) 46.2% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
19 DeepSeek V4 Pro (Reasoning, Max Effort) 46.2% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
20 GPT-5.1 (high) 45.5% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
21 Muse Spark 45.5% Imported 2026-05-11
22 Kimi K2.6 43.9% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
23 Qwen3.6 Max Preview 43.9% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
24 Qwen3.6 Plus 43.9% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-11
25 GLM-5 (Reasoning) 43.2% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
26 GLM-5.1 (Reasoning) 43.2% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
27 GPT-5.2 (medium) 43.2% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
28 GPT-5.4 (low) 43.2% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
29 MiMo-V2.5-Pro 43.2% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
30 Claude Sonnet 4.6 (Non-reasoning, Low Effort) 42.4% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
31 GPT-5.4 nano (xhigh) 42.4% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
32 DeepSeek V4 Pro (Reasoning, High Effort) 41.7% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
33 Gemini 3 Pro Preview (high) 41.7% Gemini 3
google-gemini-3
Imported 2026-05-11
34 MiMo-V2.5 41.7% MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-11
35 Claude Opus 4.5 (Non-reasoning) 40.9% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
36 Grok 4.20 0309 (Reasoning) 40.9% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
37 MiMo-V2-Pro 40.9% MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-11
38 Qwen3.5 397B A17B (Reasoning) 40.9% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
39 GLM-5 (Non-reasoning) 39.4% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
40 MiniMax-M2.7 39.4% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
41 DeepSeek V4 Flash (Reasoning, High Effort) 38.6% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
42 Gemini 3 Flash Preview (Reasoning) 38.6% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
43 GPT-5 (medium) 37.9% GPT-5
openai-gpt-5
Imported 2026-05-11
44 GPT-5 Codex (high) 37.9% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
45 GPT-5.4 (Non-reasoning) 37.9% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
46 Grok 4 37.9% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
47 Grok 4.20 0309 v2 (Reasoning) 37.9% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
48 Grok 4.3 37.9% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
49 Kimi K2.6 (Non-reasoning) 37.9% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
50 GPT-5.2 Codex (xhigh) 37.1% GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-11
51 o3 37.1% o3
openai-o3
Imported 2026-05-11
52 DeepSeek V4 Pro (Non-reasoning) 36.4% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
53 Gemma 4 31B (Reasoning) 36.4% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
54 Claude 4.5 Sonnet (Reasoning) 35.6% Imported 2026-05-11
55 DeepSeek V3.2 (Reasoning) 35.6% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
56 DeepSeek V4 Flash (Reasoning, Max Effort) 35.6% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
57 GLM-5.1 (Non-reasoning) 35.6% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
58 MiMo-V2-Omni-0327 35.6% Imported 2026-05-11
59 MiMo-V2.5-Pro (Non-reasoning) 35.6% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
60 Qwen3.5 397B A17B (Non-reasoning) 35.6% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
61 DeepSeek V3.2 Speciale 34.8% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
62 GPT-5.1 Codex (high) 34.8% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
63 Kimi K2.5 (Reasoning) 34.8% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
64 MiMo-V2-Omni 34.8% MiMo-V2-Omni
xiaomi-mimo-v2-omni
Imported 2026-05-11
65 MiniMax-M2.5 34.8% MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-11
66 Qwen3.6 27B (Reasoning) 34.8% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
67 Qwen3.6 35B A3B (Reasoning) 34.8% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
68 Claude 4.1 Opus (Reasoning) 34.3% Imported 2026-05-11
69 DeepSeek V4 Flash (Non-reasoning) 34.1% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
70 Gemini 3 Pro Preview (low) 34.1% Gemini 3
google-gemini-3
Imported 2026-05-11
71 GPT-5.4 mini (medium) 34.1% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
72 Hy3-preview (Reasoning) 34.1% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
73 GLM-5-Turbo 33.3% GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-11
74 GPT-5 mini (high) 33.3% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
75 GPT-5.1 Codex mini (high) 33.3% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
76 GPT-5.4 nano (medium) 33.3% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
77 Mistral Medium 3.5 33.3% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-11
78 DeepSeek V3.2 (Non-reasoning) 32.6% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
79 GLM 5V Turbo (Reasoning) 32.6% GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-11
80 GPT-5 (high) 32.6% GPT-5
openai-gpt-5
Imported 2026-05-11
81 Qwen3.5 27B (Reasoning) 32.6% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
82 Step 3.5 Flash 2603 32.6% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
83 DeepSeek V3.1 Terminus (Non-reasoning) 31.8% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
84 Gemini 3 Flash Preview (Non-reasoning) 31.8% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
85 GLM-4.7 (Reasoning) 31.8% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
86 GPT-5.2 (Non-reasoning) 31.8% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
87 Hy3-preview (Non-reasoning) 31.8% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
88 Qwen3.5 27B (Non-reasoning) 31.8% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
89 Claude 4 Opus (Reasoning) 31.1% Imported 2026-05-11
90 Claude 4 Sonnet (Reasoning) 31.1% Imported 2026-05-11
91 DeepSeek V3.2 Exp (Reasoning) 31.1% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
92 Kimi K2 Thinking 31.1% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
93 Ling-2.6-1T 31.1% I Ling-2.6-1T
inclusionai-ling-2.6-1t
Imported 2026-05-11
94 MiMo-V2-Flash (Feb 2026) 31.1% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
95 Qwen3.5 122B A10B (Reasoning) 31.1% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
96 DeepSeek V3.1 Terminus (Reasoning) 30.3% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
97 Gemma 4 31B (Non-reasoning) 30.3% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
98 GLM-4.7 (Non-reasoning) 30.3% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
99 Qwen3.5 122B A10B (Non-reasoning) 29.5% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
100 Claude 4.5 Sonnet (Non-reasoning) 28.8% Imported 2026-05-11
101 GLM-4.6 (Non-reasoning) 28.8% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
102 GPT-5 mini (medium) 28.8% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
103 MiniMax-M2.1 28.8% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
104 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 28.8% Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
105 MiMo-V2-Flash (Reasoning) 28% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
106 Claude 4 Sonnet (Non-reasoning) 27.3% Imported 2026-05-11
107 Claude 4.5 Haiku (Non-reasoning) 27.3% Imported 2026-05-11
108 Claude 4.5 Haiku (Reasoning) 27.3% Imported 2026-05-11
109 Step 3.5 Flash 27.3% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
110 Doubao Seed Code 26.5% Imported 2026-05-11
111 Gemini 2.5 Pro 26.5% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
112 GPT-5 (low) 26.5% GPT-5
openai-gpt-5
Imported 2026-05-11
113 Mercury 2 26.5% I Mercury 2
inception-mercury-2
Imported 2026-05-11
114 Qwen3.5 35B A3B (Reasoning) 26.5% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
115 MiMo-V2-Flash (Non-reasoning) 25.8% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
116 MiniMax-M2 25.8% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
117 Qwen3.6 35B A3B (Non-reasoning) 25.8% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
118 DeepSeek V3.1 (Reasoning) 25% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
119 DeepSeek V3.2 Exp (Non-reasoning) 25% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
120 ERNIE 5.0 Thinking Preview 25% Imported 2026-05-11
121 Gemma 4 26B A4B (Non-reasoning) 25% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
122 GLM-4.6 (Reasoning) 25% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
123 DeepSeek V3.1 (Non-reasoning) 24.2% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
124 Gemini 3.1 Flash-Lite Preview 24.2% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
125 GPT-5.4 nano (Non-Reasoning) 24.2% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
126 Grok 4.1 Fast (Reasoning) 24.2% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
127 Nova 2.0 Pro Preview (medium) 24.2% Imported 2026-05-11
128 Qwen3 Max Thinking 24.2% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
129 Qwen3.5 9B (Reasoning) 24.2% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
130 gpt-oss-120B (high) 23.5% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
131 Kimi K2 0905 23.5% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
132 GPT-5.1 (Non-reasoning) 22.7% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
133 K-EXAONE (Reasoning) 22.7% Imported 2026-05-11
134 Trinity Large Thinking 22.7% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-11
135 GLM-4.5 (Reasoning) 22% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
136 GLM-4.7-Flash (Reasoning) 22% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
137 Grok 4.20 0309 (Non-reasoning) 22% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
138 Claude 3.7 Sonnet (Non-reasoning) 21.2% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
139 Claude 3.7 Sonnet (Reasoning) 21.2% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
140 Ling 2.6 Flash 21.2% I Ling-2.6-flash
inclusionai-ling-2.6-flash
Imported 2026-05-11
141 Nemotron Cascade 2 30B A3B 21.2% Imported 2026-05-11
142 Qwen3.5 Omni Plus 21.2% Imported 2026-05-11
143 Qwen3.6 27B (Non-reasoning) 21.2% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
144 EXAONE 4.5 33B 20.5% Imported 2026-05-11
145 GLM-4.5-Air 20.5% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
146 Qwen3 Max 20.5% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
147 Qwen3 Max (Preview) 19.7% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
148 Devstral 2 18.9% Imported 2026-05-11
149 Grok 4 Fast (Reasoning) 18.9% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
150 Grok 4.3 (Non-reasoning) 18.9% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
151 Kimi K2.5 (Non-reasoning) 18.9% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
152 Qwen3 Coder 480B A35B Instruct 18.9% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
153 GPT-5 (minimal) 18.2% GPT-5
openai-gpt-5
Imported 2026-05-11
154 GPT-5.4 mini (Non-Reasoning) 18.2% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
155 JT-MINI 18.2% Imported 2026-05-11
156 Qwen3 Coder Next 18.2% Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-11
157 Qwen3.5 4B (Reasoning) 18.2% Imported 2026-05-11
158 Qwen3.5 9B (Non-reasoning) 18.2% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
159 GPT-5 nano (medium) 17.4% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
160 Grok 3 mini Reasoning (high) 17.4% Imported 2026-05-11
161 Grok Code Fast 1 17.4% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
162 Mistral Small 4 (Reasoning) 17.4% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
163 Nova 2.0 Lite (medium) 17.4% Imported 2026-05-11
164 Nova 2.0 Pro Preview (low) 17.4% Imported 2026-05-11
165 Qwen3 Max Thinking (Preview) 17.4% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
166 Cogito v2.1 (Reasoning) 16.7% Imported 2026-05-11
167 Devstral Small 2 16.7% Imported 2026-05-11
168 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 16.7% Imported 2026-05-11
169 Grok 4.20 0309 v2 (Non-reasoning) 16.7% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
170 Nova 2.0 Lite (high) 16.7% Imported 2026-05-11
171 Nova 2.0 Pro Preview (Non-reasoning) 16.7% Imported 2026-05-11
172 DeepSeek R1 0528 (May '25) 15.9% R1
deepseek-r1
Imported 2026-05-11
173 Kimi K2 15.9% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
174 Mistral Large 3 15.9% Imported 2026-05-11
175 DeepSeek V3 0324 15.2% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
176 o4-mini (high) 15.2% o4 Mini
openai-o4-mini
Imported 2026-05-11
177 Qwen3 235B A22B 2507 Instruct 15.2% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
178 Qwen3 Coder 30B A3B Instruct 15.2% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
179 Apriel-v1.6-15B-Thinker 14.4% Imported 2026-05-11
180 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 14.4% Imported 2026-05-11
181 GLM-4.6V (Reasoning) 14.4% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
182 GPT-5 mini (minimal) 14.4% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
183 Grok 4.1 Fast (Non-reasoning) 14.4% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
184 Gemini 2.5 Flash (Reasoning) 13.6% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
185 Gemma 4 26B A4B (Reasoning) 13.6% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
186 GPT-4.1 13.6% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
187 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 13.6% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
188 Qwen3 235B A22B 2507 (Reasoning) 13.6% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
189 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 12.9% Imported 2026-05-11
190 GPT-5 (ChatGPT) 12.9% GPT-5
openai-gpt-5
Imported 2026-05-11
191 Magistral Medium 1.2 12.9% Imported 2026-05-11
192 o1 12.9% o1
openai-o1
Imported 2026-05-11
193 Gemini 2.5 Flash (Non-reasoning) 12.1% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
194 GPT-5 nano (high) 12.1% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
195 Grok 4 Fast (Non-reasoning) 12.1% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
196 HyperCLOVA X SEED Think (32B) 12.1% Imported 2026-05-11
197 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 12.1% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
198 Grok 3 11.4% GROK Grok 3
xaigrok-3
Imported 2026-05-11
199 Hermes 4 - Llama-3.1 405B (Reasoning) 11.4% Imported 2026-05-11
200 Kimi Linear 48B A3B Instruct 11.4% Imported 2026-05-11
201 Qwen3 VL 235B A22B (Reasoning) 11.4% Imported 2026-05-11
202 Qwen3.5 4B (Non-reasoning) 11.4% Imported 2026-05-11
203 Apriel-v1.5-15B-Thinker 10.6% Imported 2026-05-11
204 gpt-oss-20B (high) 10.6% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
205 Ling-1T 10.6% Imported 2026-05-11
206 Ling-flash-2.0 10.6% Imported 2026-05-11
207 LongCat Flash Lite 10.6% Imported 2026-05-11
208 Mistral Medium 3.1 10.6% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
209 Mistral Small 4 (Non-reasoning) 10.6% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
210 Qwen3.5 35B A3B (Non-reasoning) 10.6% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
211 Hermes 4 - Llama-3.1 405B (Non-reasoning) 9.8% Imported 2026-05-11
212 K2-V2 (high) 9.8% Imported 2026-05-11
213 Qwen3 Next 80B A3B (Reasoning) 9.8% Imported 2026-05-11
214 Devstral Medium 9.1% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
215 INTELLECT-3 9.1% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
216 KAT-Coder-Pro V1 9.1% Imported 2026-05-11
217 Magistral Medium 1 9.1% Imported 2026-05-11
218 Gemma 4 E4B (Reasoning) 8.3% Imported 2026-05-11
219 GPT-4o (Aug '24) 8.3% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-11
220 GPT-4o (Nov '24) 8.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
221 K2-V2 (medium) 8.3% Imported 2026-05-11
222 Nemotron 3 Nano Omni 30B A3B Reasoning 8.3% Imported 2026-05-11
223 Qwen3 VL 32B Instruct 8.3% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
224 Qwen3.5 Omni Flash 8.3% Imported 2026-05-11
225 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 7.6% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
226 Gemma 4 E4B (Non-reasoning) 7.6% Imported 2026-05-11
227 GPT-4.1 mini 7.6% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
228 Mistral Small 3.1 7.6% Imported 2026-05-11
229 Qwen3 Next 80B A3B Instruct 7.6% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
230 Qwen3 VL 32B (Reasoning) 7.6% Imported 2026-05-11
231 Ring-flash-2.0 7.6% Imported 2026-05-11
232 Solar Pro 3 7.6% U Solar Pro 3
upstage-solar-pro-3
Imported 2026-05-11
233 DeepSeek V3 (Dec '24) 6.8% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
234 GLM-4.5V (Non-reasoning) 6.8% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
235 GPT-5 nano (minimal) 6.8% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
236 K-EXAONE (Non-reasoning) 6.8% Imported 2026-05-11
237 K2 Think V2 6.8% Imported 2026-05-11
238 Llama 3.1 Instruct 405B 6.8% Imported 2026-05-11
239 Llama 4 Maverick 6.8% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
240 Mistral Small 3.2 6.8% Imported 2026-05-11
241 Nova 2.0 Lite (Non-reasoning) 6.8% Imported 2026-05-11
242 Nova 2.0 Omni (Non-reasoning) 6.8% Imported 2026-05-11
243 Nova Premier 6.8% Imported 2026-05-11
244 NVIDIA Nemotron 3 Nano 4B 6.8% Imported 2026-05-11
245 o3-mini 6.8% o3-mini
openai-o3-mini
Imported 2026-05-11
246 Qwen3 30B A3B (Non-reasoning) 6.8% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
247 Qwen3 VL 235B A22B Instruct 6.8% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
248 Ring-1T 6.8% Imported 2026-05-11
249 Seed-OSS-36B-Instruct 6.8% Imported 2026-05-11
250 DeepSeek R1 (Jan '25) 6.1% R1
deepseek-r1
Imported 2026-05-11
251 Devstral Small (Jul '25) 6.1% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
252 Devstral Small (May '25) 6.1% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
253 ERNIE 4.5 300B A47B 6.1% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
254 Mistral Large 2 (Nov '24) 6.1% Imported 2026-05-11
255 Nova Pro 6.1% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
256 o3-mini (high) 6.1% o3 Mini High
openai-o3-mini-high
Imported 2026-05-11
257 Qwen3 235B A22B (Non-reasoning) 6.1% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
258 Qwen3 235B A22B (Reasoning) 6.1% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
259 Qwen3 30B A3B 2507 Instruct 6.1% Imported 2026-05-11
260 Qwen3 VL 30B A3B Instruct 6.1% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
261 GLM-4.5V (Reasoning) 5.3% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
262 gpt-oss-120B (low) 5.3% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
263 Llama Nemotron Super 49B v1.5 (Reasoning) 5.3% Imported 2026-05-11
264 Qwen3 14B (Non-reasoning) 5.3% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
265 Qwen3 30B A3B 2507 (Reasoning) 5.3% Imported 2026-05-11
266 Qwen3 VL 30B A3B (Reasoning) 5.3% Imported 2026-05-11
267 Step3 VL 10B 5.3% Imported 2026-05-11
268 Gemini 2.5 Flash-Lite (Reasoning) 4.5% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
269 gpt-oss-20B (low) 4.5% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
270 Hermes 4 - Llama-3.1 70B (Reasoning) 4.5% Imported 2026-05-11
271 K2-V2 (low) 4.5% Imported 2026-05-11
272 Llama 3.1 Nemotron Instruct 70B 4.5% Imported 2026-05-11
273 Magistral Small 1 4.5% Imported 2026-05-11
274 Magistral Small 1.2 4.5% Imported 2026-05-11
275 Ministral 3 14B 4.5% Imported 2026-05-11
276 Ministral 3 8B 4.5% Imported 2026-05-11
277 Nova 2.0 Omni (medium) 4.5% Imported 2026-05-11
278 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 4.5% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
279 Qwen2.5 Instruct 72B 4.5% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
280 Qwen3 4B 2507 Instruct 4.5% Imported 2026-05-11
281 Solar Pro 2 (Non-reasoning) 4.5% Imported 2026-05-11
282 EXAONE 4.0 32B (Reasoning) 3.8% Imported 2026-05-11
283 Gemini 2.0 Flash (Feb '25) 3.8% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
284 Gemma 3 27B Instruct 3.8% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
285 GLM-4.7-Flash (Non-reasoning) 3.8% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
286 GPT-4.1 nano 3.8% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
287 Llama Nemotron Super 49B v1.5 (Non-reasoning) 3.8% Imported 2026-05-11
288 Mistral Medium 3 3.8% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
289 Motif-2-12.7B-Reasoning 3.8% Imported 2026-05-11
290 Nova 2.0 Lite (low) 3.8% Imported 2026-05-11
291 Nova 2.0 Omni (low) 3.8% Imported 2026-05-11
292 Phi-4 3.8% Phi 4
microsoft-phi-4
Imported 2026-05-11
293 Qwen3 14B (Reasoning) 3.8% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
294 Qwen3 Omni 30B A3B (Reasoning) 3.8% Imported 2026-05-11
295 Qwen3 VL 8B (Reasoning) 3.8% Imported 2026-05-11
296 Qwen3.5 2B (Non-reasoning) 3.8% Imported 2026-05-11
297 Qwen3.5 2B (Reasoning) 3.8% Imported 2026-05-11
298 Gemma 4 E2B (Reasoning) 3% Imported 2026-05-11
299 GLM-4.6V (Non-reasoning) 3% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
300 Llama 3.1 Instruct 70B 3% Imported 2026-05-11
301 Llama 3.3 Instruct 70B 3% Imported 2026-05-11
302 Mi:dm K 2.5 Pro Preview 3% Imported 2026-05-11
303 MiniMax M1 80k 3% Imported 2026-05-11
304 Qwen3 32B (Reasoning) 3% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
305 Solar Pro 2 (Reasoning) 3% Imported 2026-05-11
306 Claude 3.5 Haiku 2.3% Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-11
307 Falcon-H1R-7B 2.3% Imported 2026-05-11
308 Gemini 2.5 Flash-Lite (Non-reasoning) 2.3% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
309 Gemma 3n E4B Instruct 2.3% Imported 2026-05-11
310 Gemma 4 E2B (Non-reasoning) 2.3% Imported 2026-05-11
311 Granite 4.0 H Small 2.3% Imported 2026-05-11
312 Granite 4.1 30B 2.3% Imported 2026-05-11
313 Granite 4.1 3B 2.3% Imported 2026-05-11
314 Jamba 1.7 Large 2.3% Imported 2026-05-11
315 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 2.3% Imported 2026-05-11
316 Mi:dm K 2.5 Pro 2.3% Imported 2026-05-11
317 MiniMax M1 40k 2.3% Imported 2026-05-11
318 Qwen3 30B A3B (Reasoning) 2.3% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
319 Qwen3 8B (Non-reasoning) 2.3% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
320 Qwen3 8B (Reasoning) 2.3% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
321 Qwen3 VL 8B Instruct 2.3% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
322 Sarvam 30B (high) 2.3% Imported 2026-05-11
323 Sarvam M (Reasoning) 2.3% Imported 2026-05-11
324 Solar Open 100B (Reasoning) 2.3% Imported 2026-05-11
325 Tri-21B-think Preview 2.3% Imported 2026-05-11
326 DeepSeek R1 0528 Qwen3 8B 1.5% Imported 2026-05-11
327 DeepSeek R1 Distill Llama 70B 1.5% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
328 EXAONE 4.0 32B (Non-reasoning) 1.5% Imported 2026-05-11
329 Granite 4.0 Micro 1.5% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
330 Llama 4 Scout 1.5% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
331 Nova Micro 1.5% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
332 NVIDIA Nemotron Nano 9B V2 (Reasoning) 1.5% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
333 Olmo 3 32B Think 1.5% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
334 Qwen3 4B 2507 (Reasoning) 1.5% Imported 2026-05-11
335 Qwen3 Omni 30B A3B Instruct 1.5% Imported 2026-05-11
336 Qwen3 VL 4B (Reasoning) 1.5% Imported 2026-05-11
337 Sarvam 105B (high) 1.5% Imported 2026-05-11
338 Claude 3 Haiku 0.8% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-11
339 Command A 0.8% C Command A
cohere-command-a
Imported 2026-05-11
340 Gemma 3 12B Instruct 0.8% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
341 Gemma 3 4B Instruct 0.8% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
342 Gemma 3n E2B Instruct 0.8% Imported 2026-05-11
343 Jamba Reasoning 3B 0.8% Imported 2026-05-11
344 LFM2 2.6B 0.8% Imported 2026-05-11
345 Ling-mini-2.0 0.8% Imported 2026-05-11
346 Llama 3 Instruct 70B 0.8% Imported 2026-05-11
347 Llama 3.1 Instruct 8B 0.8% Imported 2026-05-11
348 Llama 3.2 Instruct 11B (Vision) 0.8% Imported 2026-05-11
349 Nova Lite 0.8% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
350 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 0.8% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
351 Olmo 3 7B Think 0.8% Imported 2026-05-11
352 Tri-21B-Think 0.8% Imported 2026-05-11
353 Apertus 70B Instruct 0% Imported 2026-05-11
354 Apertus 8B Instruct 0% Imported 2026-05-11
355 Exaone 4.0 1.2B (Non-reasoning) 0% Imported 2026-05-11
356 Exaone 4.0 1.2B (Reasoning) 0% Imported 2026-05-11
357 Gemma 3 1B Instruct 0% Imported 2026-05-11
358 Gemma 3 270M 0% Imported 2026-05-11
359 Granite 3.3 8B (Non-reasoning) 0% Imported 2026-05-11
360 Granite 4.0 1B 0% Imported 2026-05-11
361 Granite 4.0 350M 0% Imported 2026-05-11
362 Granite 4.0 H 1B 0% Imported 2026-05-11
363 Granite 4.0 H 350M 0% Imported 2026-05-11
364 Granite 4.1 8B 0% Granite 4.1 8B
ibm-granite-granite-4.1-8b
Imported 2026-05-11
365 Hermes 4 - Llama-3.1 70B (Non-reasoning) 0% Imported 2026-05-11
366 Jamba 1.7 Mini 0% Imported 2026-05-11
367 LFM2 1.2B 0% Imported 2026-05-11
368 LFM2 24B A2B 0% LFM LFM2-24B-A2B
liquid-lfm-2-24b-a2b
Imported 2026-05-11
369 LFM2 8B A1B 0% Imported 2026-05-11
370 LFM2.5-1.2B-Instruct 0% LFM LFM2.5-1.2B-Instruct
liquid-lfm-2.5-1.2b-instruct
Imported 2026-05-11
371 LFM2.5-1.2B-Thinking 0% LFM LFM2.5-1.2B-Thinking
liquid-lfm-2.5-1.2b-thinking
Imported 2026-05-11
372 LFM2.5-VL-1.6B 0% Imported 2026-05-11
373 Llama 3 Instruct 8B 0% Imported 2026-05-11
374 Llama 3.2 Instruct 1B 0% Imported 2026-05-11
375 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 0% Imported 2026-05-11
376 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 0% Imported 2026-05-11
377 MiniCPM-V 4.6 1.3B 0% Imported 2026-05-11
378 Ministral 3 3B 0% Imported 2026-05-11
379 Molmo 7B-D 0% Imported 2026-05-11
380 Molmo2-8B 0% Imported 2026-05-11
381 Nanbeige4.1-3B 0% Imported 2026-05-11
382 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 0% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
383 OLMo 2 32B 0% Imported 2026-05-11
384 OLMo 2 7B 0% Imported 2026-05-11
385 Olmo 3 7B Instruct 0% Imported 2026-05-11
386 Olmo 3.1 32B Instruct 0% OLMO Olmo 3.1 32B Instruct
allenai-olmo-3.1-32b-instruct
Imported 2026-05-11
387 Olmo 3.1 32B Think 0% Imported 2026-05-11
388 Phi-3 Mini Instruct 3.8B 0% Imported 2026-05-11
389 Phi-4 Mini Instruct 0% Imported 2026-05-11
390 Qwen3 0.6B (Non-reasoning) 0% Imported 2026-05-11
391 Qwen3 0.6B (Reasoning) 0% Imported 2026-05-11
392 Qwen3 1.7B (Non-reasoning) 0% Imported 2026-05-11
393 Qwen3 1.7B (Reasoning) 0% Imported 2026-05-11
394 Qwen3 VL 4B Instruct 0% Imported 2026-05-11
395 Qwen3.5 0.8B (Non-reasoning) 0% Imported 2026-05-11
396 Qwen3.5 0.8B (Reasoning) 0% Imported 2026-05-11
397 Reka Flash 3 0% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
398 Tiny Aya Global 0% Imported 2026-05-11