SciCode

Scientist-curated coding benchmark with subproblems drawn from laboratory problems across scientific disciplines.

483rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy

Showing 2 latest source slices.

Latest Results

Provider-published Qwen3.7-Max comparison scores. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Accuracy Model Match Provenance Sampled
1 Qwen3.7 Max 53.5% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 Kimi K2.6 Thinking 52.2% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
3 Claude Opus 4.6 Max 51.9% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
4 GLM-5.1 Thinking 45.1% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
5 Qwen3.6 Plus 41.4% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
1 Gemini 3.1 Pro Preview 58.9% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
2 GPT-5.4 (xhigh) 56.6% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
3 Gemini 3 Pro Preview (high) 56.1% Gemini 3
google-gemini-3
Imported 2026-05-11
4 GPT-5.5 (xhigh) 56.1% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
5 GPT-5.5 (high) 55.9% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
6 GPT-5.2 Codex (xhigh) 54.6% GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-11
7 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 54.5% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
8 GPT-5.5 (medium) 53.5% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
9 Kimi K2.6 53.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
10 GPT-5.3 Codex (xhigh) 53.2% GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-11
11 GPT-5.2 (xhigh) 52.1% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
12 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 51.9% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
13 GPT-5.5 (low) 51.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
14 Muse Spark 51.5% Imported 2026-05-11
15 Gemini 3 Flash Preview (Reasoning) 50.6% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
16 GPT-5.4 (low) 50.3% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
17 MiMo-V2.5-Pro 50.2% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
18 Claude Opus 4.7 (Non-reasoning, High Effort) 50.1% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
19 DeepSeek V4 Pro (Reasoning, Max Effort) 50% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
20 Gemini 3 Flash Preview (Non-reasoning) 49.9% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
21 Gemini 3 Pro Preview (low) 49.9% Gemini 3
google-gemini-3
Imported 2026-05-11
22 GPT-5.4 mini (xhigh) 49.9% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
23 Claude Opus 4.5 (Reasoning) 49.5% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
24 Kimi K2.5 (Reasoning) 49% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
25 GPT-5.5 (Non-reasoning) 47.3% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
26 Grok 4.3 47.3% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
27 GPT-5.4 (Non-reasoning) 47.1% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
28 Claude Opus 4.5 (Non-reasoning) 47% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
29 MiniMax-M2.7 47% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
30 Claude Sonnet 4.6 (Non-reasoning, High Effort) 46.9% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
31 GPT-5.4 nano (xhigh) 46.9% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
32 Qwen3.6 Max Preview 46.9% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
33 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 46.8% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
34 o4-mini (high) 46.5% o4 Mini
openai-o4-mini
Imported 2026-05-11
35 DeepSeek V4 Pro (Reasoning, High Effort) 46.4% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
36 GLM-5 (Reasoning) 46.2% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
37 GPT-5.2 (medium) 46.2% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
38 Claude Opus 4.6 (Non-reasoning, High Effort) 45.7% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
39 Grok 4 45.7% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
40 Grok 4.20 0309 v2 (Reasoning) 45.6% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
41 GLM-4.7 (Reasoning) 45.1% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
42 DeepSeek V4 Flash (Reasoning, Max Effort) 44.9% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
43 Claude 4.5 Sonnet (Reasoning) 44.7% Imported 2026-05-11
44 Grok 4.20 0309 (Reasoning) 44.7% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
45 GPT-5.4 mini (medium) 44.2% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
46 Grok 4 Fast (Reasoning) 44.2% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
47 Grok 4.1 Fast (Reasoning) 44.2% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
48 Claude Sonnet 4.6 (Non-reasoning, Low Effort) 44.1% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
49 DeepSeek V3.2 Speciale 44% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
50 GLM-5.1 (Reasoning) 43.8% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
51 GLM-5-Turbo 43.6% GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-11
52 GLM 5V Turbo (Reasoning) 43.5% GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-11
53 Gemma 4 31B (Reasoning) 43.4% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
54 Claude 4.5 Haiku (Reasoning) 43.3% Imported 2026-05-11
55 GPT-5.1 (high) 43.3% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
56 MiMo-V2.5 43.1% MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-11
57 Qwen3 Max Thinking 43.1% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
58 GPT-5 (high) 42.9% GPT-5
openai-gpt-5
Imported 2026-05-11
59 Claude 4.5 Sonnet (Non-reasoning) 42.8% Imported 2026-05-11
60 Gemini 2.5 Pro 42.8% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
61 Nova 2.0 Pro Preview (medium) 42.7% Imported 2026-05-11
62 GPT-5.1 Codex mini (high) 42.6% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
63 MiniMax-M2.5 42.6% MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-11
64 MiMo-V2-Pro 42.5% MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-11
65 DeepSeek V4 Pro (Non-reasoning) 42.4% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
66 Kimi K2 Thinking 42.4% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
67 Qwen3 235B A22B 2507 (Reasoning) 42.4% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
68 DeepSeek V4 Flash (Reasoning, High Effort) 42% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
69 Qwen3.5 122B A10B (Reasoning) 42% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
70 Qwen3.5 397B A17B (Reasoning) 42% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
71 Gemini 3.1 Flash-Lite Preview 41.9% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
72 Gemini 2.5 Pro Preview (May' 25) 41.6% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
73 Hy3-preview (Reasoning) 41.2% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
74 Gemma 4 31B (Non-reasoning) 41.1% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
75 GPT-5 (medium) 41.1% GPT-5
openai-gpt-5
Imported 2026-05-11
76 Qwen3.5 397B A17B (Non-reasoning) 41.1% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
77 Cogito v2.1 (Reasoning) 41% Imported 2026-05-11
78 GPT-5 mini (medium) 41% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
79 o3 41% o3
openai-o3
Imported 2026-05-11
80 Claude 4 Opus (Non-reasoning) 40.9% Imported 2026-05-11
81 Claude 4.1 Opus (Reasoning) 40.9% Imported 2026-05-11
82 GPT-5 Codex (high) 40.9% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
83 Doubao Seed Code 40.7% Imported 2026-05-11
84 MiniMax-M2.1 40.7% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
85 Qwen3.6 Plus 40.7% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-11
86 DeepSeek V3.1 Terminus (Reasoning) 40.6% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
87 Grok 3 mini Reasoning (high) 40.6% Imported 2026-05-11
88 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 40.5% Imported 2026-05-11
89 Qwen3.5 Omni Plus 40.5% Imported 2026-05-11
90 GPT-4.1 mini 40.4% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
91 GPT-5.2 (Non-reasoning) 40.4% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
92 Step 3.5 Flash 40.4% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
93 Claude 3.7 Sonnet (Reasoning) 40.3% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
94 DeepSeek R1 0528 (May '25) 40.3% R1
deepseek-r1
Imported 2026-05-11
95 GPT-5.1 Codex (high) 40.2% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
96 Claude 4 Sonnet (Reasoning) 40% Imported 2026-05-11
97 Gemma 4 26B A4B (Reasoning) 40% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
98 DeepSeek V3.2 Exp (Non-reasoning) 39.9% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
99 o3-mini 39.9% o3-mini
openai-o3-mini
Imported 2026-05-11
100 Qwen3 235B A22B (Reasoning) 39.9% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
101 Qwen3 VL 235B A22B (Reasoning) 39.9% Imported 2026-05-11
102 Claude 4 Opus (Reasoning) 39.8% Imported 2026-05-11
103 o3-mini (high) 39.8% o3 Mini High
openai-o3-mini-high
Imported 2026-05-11
104 Qwen3.6 27B (Reasoning) 39.8% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
105 GPT-5.4 mini (Non-Reasoning) 39.6% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
106 Kimi K2.5 (Non-reasoning) 39.6% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
107 Mistral Medium 3.5 39.6% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-11
108 Gemini 2.5 Pro Preview (Mar' 25) 39.5% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
109 Kimi K2.6 (Non-reasoning) 39.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
110 MiMo-V2-Omni-0327 39.5% Imported 2026-05-11
111 Qwen3.5 27B (Reasoning) 39.5% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
112 Gemini 2.5 Flash (Reasoning) 39.4% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
113 Hy3-preview (Non-reasoning) 39.4% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
114 MiMo-V2-Flash (Reasoning) 39.4% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
115 GPT-5 mini (high) 39.2% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
116 Magistral Medium 1.2 39.2% Imported 2026-05-11
117 DeepSeek V3.1 (Reasoning) 39.1% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
118 GPT-5 (low) 39.1% GPT-5
openai-gpt-5
Imported 2026-05-11
119 INTELLECT-3 39.1% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
120 MiMo-V2.5-Pro (Non-reasoning) 39.1% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
121 DeepSeek V3.2 (Reasoning) 38.9% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
122 gpt-oss-120B (high) 38.9% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
123 GPT-5 (minimal) 38.8% GPT-5
openai-gpt-5
Imported 2026-05-11
124 Qwen3 Next 80B A3B (Reasoning) 38.8% Imported 2026-05-11
125 DeepSeek V3.2 (Non-reasoning) 38.7% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
126 Mercury 2 38.7% I Mercury 2
inception-mercury-2
Imported 2026-05-11
127 Nova 2.0 Pro Preview (low) 38.7% Imported 2026-05-11
128 Qwen3 Max Thinking (Preview) 38.7% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
129 Step 3.5 Flash 2603 38.5% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
130 GLM-4.6 (Reasoning) 38.4% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
131 GPT-5.4 nano (medium) 38.4% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
132 GLM-5 (Non-reasoning) 38.3% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
133 KAT Coder Pro V2 38.3% K KAT-Coder-Pro V2
kwaipilot-kat-coder-pro-v2
Imported 2026-05-11
134 MiMo-V2-Flash (Feb 2026) 38.3% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
135 Qwen3 Max 38.3% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
136 GPT-4.1 38.1% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
137 Mistral Small 4 (Reasoning) 38% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
138 GPT-5 (ChatGPT) 37.8% GPT-5
openai-gpt-5
Imported 2026-05-11
139 MiniMax M1 40k 37.8% Imported 2026-05-11
140 DeepSeek V3.2 Exp (Reasoning) 37.7% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
141 Qwen3.5 35B A3B (Reasoning) 37.7% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
142 Claude 3.7 Sonnet (Non-reasoning) 37.6% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
143 DeepSeek R1 Distill Qwen 32B 37.6% R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-11
144 ERNIE 5.0 Thinking Preview 37.5% Imported 2026-05-11
145 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 37.5% Imported 2026-05-11
146 Grok 4.3 (Non-reasoning) 37.4% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
147 MiniMax M1 80k 37.4% Imported 2026-05-11
148 Apriel-v1.6-15B-Thinker 37.3% Imported 2026-05-11
149 Claude 4 Sonnet (Non-reasoning) 37.3% Imported 2026-05-11
150 DeepSeek V4 Flash (Non-reasoning) 37.3% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
151 Gemma 4 26B A4B (Non-reasoning) 37.3% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
152 Qwen3.6 27B (Non-reasoning) 37.3% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
153 Ling-2.6-1T 37% I Ling-2.6-1T
inclusionai-ling-2.6-1t
Imported 2026-05-11
154 Qwen3 Max (Preview) 37% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
155 GPT-5 mini (minimal) 36.9% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
156 Nova 2.0 Lite (high) 36.9% Imported 2026-05-11
157 Grok 3 36.8% GROK Grok 3
xaigrok-3
Imported 2026-05-11
158 Nova 2.0 Lite (medium) 36.8% Imported 2026-05-11
159 DeepSeek V3.1 (Non-reasoning) 36.7% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
160 MiMo-V2-Omni 36.7% MiMo-V2-Omni
xiaomi-mimo-v2-omni
Imported 2026-05-11
161 Qwen3.5 27B (Non-reasoning) 36.7% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
162 Ring-1T 36.7% Imported 2026-05-11
163 Claude 3.5 Sonnet (Oct '24) 36.6% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
164 GPT-4o (March 2025, chatgpt-4o-latest) 36.6% GPT-4o
openai-gpt-4o
Imported 2026-05-11
165 GPT-5 nano (high) 36.6% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
166 KAT-Coder-Pro V1 36.6% Imported 2026-05-11
167 GPT-5.1 (Non-reasoning) 36.5% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
168 Seed-OSS-36B-Instruct 36.5% Imported 2026-05-11
169 Grok Code Fast 1 36.2% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
170 Mistral Large 3 36.2% Imported 2026-05-11
171 Nova 2.0 Omni (medium) 36.2% Imported 2026-05-11
172 GLM-5.1 (Non-reasoning) 36.1% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
173 MiniMax-M2 36.1% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
174 Trinity Large Thinking 36.1% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-11
175 gpt-oss-120B (low) 36% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
176 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 36% Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
177 Qwen3 235B A22B 2507 Instruct 36% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
178 Gemini 2.5 Flash Preview (Reasoning) 35.9% Imported 2026-05-11
179 Qwen3 Coder 480B A35B Instruct 35.9% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
180 Qwen3 VL 235B A22B Instruct 35.9% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
181 DeepSeek V3 0324 35.8% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
182 o1 35.8% o1
openai-o1
Imported 2026-05-11
183 Qwen3.6 35B A3B (Reasoning) 35.8% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
184 QwQ 32B 35.8% Imported 2026-05-11
185 DeepSeek R1 (Jan '25) 35.7% R1
deepseek-r1
Imported 2026-05-11
186 K-EXAONE (Reasoning) 35.6% Imported 2026-05-11
187 Qwen3.5 122B A10B (Non-reasoning) 35.6% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
188 DeepSeek V3 (Dec '24) 35.4% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
189 GLM-4.7 (Non-reasoning) 35.4% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
190 Qwen3 32B (Reasoning) 35.4% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
191 GPT-5.4 nano (Non-Reasoning) 35.2% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
192 Ling-1T 35.2% Imported 2026-05-11
193 Magistral Small 1.2 35.2% Imported 2026-05-11
194 Apriel-v1.5-15B-Thinker 34.8% Imported 2026-05-11
195 GLM-4.5 (Reasoning) 34.8% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
196 Llama Nemotron Super 49B v1.5 (Reasoning) 34.8% Imported 2026-05-11
197 Nemotron Cascade 2 30B A3B 34.8% Imported 2026-05-11
198 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 34.7% Imported 2026-05-11
199 Hermes 4 - Llama-3.1 405B (Non-reasoning) 34.6% Imported 2026-05-11
200 Kimi K2 34.5% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
201 Claude 4.5 Haiku (Non-reasoning) 34.4% Imported 2026-05-11
202 EXAONE 4.0 32B (Reasoning) 34.4% Imported 2026-05-11
203 gpt-oss-20B (high) 34.4% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
204 Nova 2.0 Omni (low) 34.3% Imported 2026-05-11
205 Hermes 4 - Llama-3.1 70B (Reasoning) 34.1% Imported 2026-05-11
206 Gemini 2.0 Flash (experimental) 34% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
207 gpt-oss-20B (low) 34% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
208 GPT-5 nano (medium) 33.8% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
209 Mistral Medium 3.1 33.8% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
210 GLM-4.7-Flash (Reasoning) 33.7% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
211 Qwen2.5 Max 33.7% Imported 2026-05-11
212 GPT-4o (ChatGPT) 33.4% GPT-4o
openai-gpt-4o
Imported 2026-05-11
213 Gemini 2.0 Flash (Feb '25) 33.3% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
214 GPT-4o (Nov '24) 33.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
215 Nova 2.0 Lite (low) 33.3% Imported 2026-05-11
216 Qwen3 30B A3B 2507 (Reasoning) 33.3% Imported 2026-05-11
217 Mi:dm K 2.5 Pro 33.2% Imported 2026-05-11
218 Devstral 2 33.1% Imported 2026-05-11
219 GLM-4.6 (Non-reasoning) 33.1% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
220 GPT-4o (Aug '24) 33.1% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-11
221 Llama 4 Maverick 33.1% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
222 Mistral Medium 3 33.1% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
223 K2 Think V2 33% Imported 2026-05-11
224 Gemini 2.0 Flash Thinking Experimental (Jan '25) 32.9% Imported 2026-05-11
225 Grok 4 Fast (Non-reasoning) 32.9% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
226 Grok 4.20 0309 v2 (Non-reasoning) 32.8% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
227 o1-mini 32.3% Imported 2026-05-11
228 Qwen3 Coder Next 32.3% Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-11
229 Grok 4.20 0309 (Non-reasoning) 32.2% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
230 DeepSeek V3.1 Terminus (Non-reasoning) 32.1% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
231 GPT-4 Turbo 31.9% GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-11
232 Claude 3.5 Sonnet (June '24) 31.6% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
233 Qwen3 14B (Reasoning) 31.6% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
234 ERNIE 4.5 300B A47B 31.5% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
235 DeepSeek R1 Distill Llama 70B 31.2% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
236 Gemini 2.0 Pro Experimental (Feb '25) 31.2% Imported 2026-05-11
237 Step3 VL 10B 31.1% Imported 2026-05-11
238 GPT-4o (May '24) 30.9% GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-11
239 Qwen3 VL 30B A3B Instruct 30.8% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
240 Kimi K2 0905 30.7% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
241 Qwen3 Next 80B A3B Instruct 30.7% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
242 GLM-4.5-Air 30.6% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
243 Qwen3 Omni 30B A3B (Reasoning) 30.6% Imported 2026-05-11
244 GLM-4.6V (Reasoning) 30.4% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
245 Qwen3 30B A3B 2507 Instruct 30.4% Imported 2026-05-11
246 Llama 3.1 Tulu3 405B 30.2% Imported 2026-05-11
247 Solar Pro 2 (Reasoning) 30.2% Imported 2026-05-11
248 Qwen3 VL 32B Instruct 30.1% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
249 Llama 3.1 Instruct 405B 29.9% Imported 2026-05-11
250 Qwen3 235B A22B (Non-reasoning) 29.9% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
251 Magistral Medium 1 29.7% Imported 2026-05-11
252 Mi:dm K 2.5 Pro Preview 29.7% Imported 2026-05-11
253 Grok 4.1 Fast (Non-reasoning) 29.6% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
254 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 29.6% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
255 Gemini 1.5 Pro (Sep '24) 29.5% Imported 2026-05-11
256 Grok Beta 29.5% Imported 2026-05-11
257 Devstral Medium 29.4% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
258 Olmo 3.1 32B Think 29.3% Imported 2026-05-11
259 Qwen3.5 35B A3B (Non-reasoning) 29.3% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
260 Mistral Large 2 (Nov '24) 29.2% Imported 2026-05-11
261 Pixtral Large 29.2% Mistral: Pixtral Large 2411
mistralai-pixtral-large-2411
Imported 2026-05-11
262 Gemini 2.5 Flash (Non-reasoning) 29.1% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
263 GPT-5 nano (minimal) 29.1% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
264 Ling-flash-2.0 28.9% Imported 2026-05-11
265 Devstral Small 2 28.8% Imported 2026-05-11
266 Qwen3 VL 30B A3B (Reasoning) 28.8% Imported 2026-05-11
267 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 28.7% Imported 2026-05-11
268 K2-V2 (high) 28.6% Imported 2026-05-11
269 Olmo 3 32B Think 28.6% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
270 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 28.5% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
271 Grok 2 (Dec '24) 28.5% Imported 2026-05-11
272 Qwen3 30B A3B (Reasoning) 28.5% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
273 Qwen3 VL 32B (Reasoning) 28.5% Imported 2026-05-11
274 HyperCLOVA X SEED Think (32B) 28.4% Imported 2026-05-11
275 LongCat Flash Lite 28.4% Imported 2026-05-11
276 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 28.2% Imported 2026-05-11
277 Motif-2-12.7B-Reasoning 28.2% Imported 2026-05-11
278 Command A 28.1% C Command A
cohere-command-a
Imported 2026-05-11
279 Mistral Small 4 (Non-reasoning) 28.1% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
280 Nova 2.0 Pro Preview (Non-reasoning) 28.1% Imported 2026-05-11
281 EXAONE 4.5 33B 28% Imported 2026-05-11
282 Qwen3 32B (Non-reasoning) 28% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
283 Nova 2.0 Omni (Non-reasoning) 27.9% Imported 2026-05-11
284 Nova Premier 27.9% Imported 2026-05-11
285 Nemotron 3 Nano Omni 30B A3B Reasoning 27.8% Imported 2026-05-11
286 Qwen3 Coder 30B A3B Instruct 27.8% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
287 Hermes 4 - Llama-3.1 70B (Non-reasoning) 27.7% Imported 2026-05-11
288 Qwen3.5 9B (Non-reasoning) 27.7% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
289 Qwen3.5 9B (Reasoning) 27.5% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
290 Claude 3.5 Haiku 27.4% Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-11
291 Gemini 1.5 Pro (May '24) 27.4% Imported 2026-05-11
292 GLM-4.6V (Non-reasoning) 27.2% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
293 JT-MINI 27.2% Imported 2026-05-11
294 Solar Pro 2 (Preview) (Non-reasoning) 27.2% Imported 2026-05-11
295 Ling 2.6 Flash 27.1% I Ling-2.6-flash
inclusionai-ling-2.6-flash
Imported 2026-05-11
296 Mistral Large 2 (Jul '24) 27.1% Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-11
297 Qwen2.5 Coder Instruct 32B 27.1% Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-11
298 K-EXAONE (Non-reasoning) 27% Imported 2026-05-11
299 Solar Open 100B (Reasoning) 26.9% Imported 2026-05-11
300 Gemini 1.5 Flash (Sep '24) 26.7% Imported 2026-05-11
301 Llama 3.1 Instruct 70B 26.7% Imported 2026-05-11
302 Qwen2.5 Instruct 72B 26.7% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
303 Reka Flash 3 26.7% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
304 Nanbeige4.1-3B 26.6% Imported 2026-05-11
305 Mistral Small 3.1 26.5% Imported 2026-05-11
306 Qwen3 14B (Non-reasoning) 26.5% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
307 Mistral Small 3.2 26.4% Imported 2026-05-11
308 Qwen3 30B A3B (Non-reasoning) 26.4% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
309 Sarvam 105B (high) 26.4% Imported 2026-05-11
310 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 26.2% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
311 Llama 3.3 Instruct 70B 26% Imported 2026-05-11
312 Phi-4 26% Phi 4
microsoft-phi-4
Imported 2026-05-11
313 GPT-4.1 nano 25.9% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
314 MiMo-V2-Flash (Non-reasoning) 25.9% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
315 Granite 4.1 30B 25.8% Imported 2026-05-11
316 Qwen3 4B 2507 (Reasoning) 25.6% Imported 2026-05-11
317 GLM-4.7-Flash (Non-reasoning) 25.5% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
318 Qwen3.5 Omni Flash 25.5% Imported 2026-05-11
319 EXAONE 4.0 32B (Non-reasoning) 25.2% Imported 2026-05-11
320 Hermes 4 - Llama-3.1 405B (Reasoning) 25.2% Imported 2026-05-11
321 K2-V2 (medium) 25.2% Imported 2026-05-11
322 Gemini 2.0 Flash-Lite (Feb '25) 25% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
323 Falcon-H1R-7B 24.9% Imported 2026-05-11
324 Solar Pro 2 (Non-reasoning) 24.8% Imported 2026-05-11
325 Gemini 2.0 Flash-Lite (Preview) 24.7% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
326 Solar Pro 3 24.7% U Solar Pro 3
upstage-solar-pro-3
Imported 2026-05-11
327 Devstral Small (May '25) 24.5% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
328 Gemma 4 E4B (Reasoning) 24.4% Imported 2026-05-11
329 Devstral Small (Jul '25) 24.3% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
330 Magistral Small 1 24.1% Imported 2026-05-11
331 Mistral Saba 24.1% Mistral: Saba
mistralai-mistral-saba
Imported 2026-05-11
332 Llama 3.2 Instruct 90B (Vision) 24% Imported 2026-05-11
333 Nova 2.0 Lite (Non-reasoning) 24% Imported 2026-05-11
334 DeepSeek R1 Distill Qwen 14B 23.9% Imported 2026-05-11
335 Llama Nemotron Super 49B v1.5 (Non-reasoning) 23.8% Imported 2026-05-11
336 Ministral 3 14B 23.6% Imported 2026-05-11
337 Mistral Small 3 23.6% Imported 2026-05-11
338 Claude 3 Opus 23.3% Imported 2026-05-11
339 Gemini 2.5 Flash Preview (Non-reasoning) 23.3% Imported 2026-05-11
340 Llama 3.1 Nemotron Instruct 70B 23.3% Imported 2026-05-11
341 Hermes 3 - Llama-3.1 70B 23.1% L Hermes 3 70B Instruct
nousresearch-hermes-3-llama-3.1-70b
Imported 2026-05-11
342 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 23% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
343 Claude 3 Sonnet 22.9% Imported 2026-05-11
344 Gemini 1.5 Flash-8B 22.9% Imported 2026-05-11
345 GPT-4o mini 22.9% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-11
346 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 22.9% Imported 2026-05-11
347 Qwen2 Instruct 72B 22.9% Imported 2026-05-11
348 Qwen2.5 Instruct 32B 22.9% Imported 2026-05-11
349 Sonar 22.9% Sonar
perplexity-sonar
Imported 2026-05-11
350 DeepHermes 3 - Mistral 24B Preview (Non-reasoning) 22.8% Imported 2026-05-11
351 Qwen3 8B (Reasoning) 22.6% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
352 Sonar Pro 22.6% Sonar Pro
perplexity-sonar-pro
Imported 2026-05-11
353 K2-V2 (low) 22.3% Imported 2026-05-11
354 GLM-4.5V (Reasoning) 22.1% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
355 NVIDIA Nemotron Nano 9B V2 (Reasoning) 22% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
356 Qwen3 VL 8B (Reasoning) 21.9% Imported 2026-05-11
357 Granite 4.1 8B 21.8% Granite 4.1 8B
ibm-granite-granite-4.1-8b
Imported 2026-05-11
358 Gemma 3 27B Instruct 21.2% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
359 Olmo 3 7B Think 21.2% Imported 2026-05-11
360 Gemma 4 E2B (Reasoning) 20.9% Imported 2026-05-11
361 Granite 4.0 H Small 20.9% Imported 2026-05-11
362 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 20.9% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
363 Ministral 3 8B 20.8% Imported 2026-05-11
364 Mistral Large (Feb '24) 20.8% Mistral Large
mistralai-mistral-large
Imported 2026-05-11
365 Nova Pro 20.8% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
366 DeepSeek R1 0528 Qwen3 8B 20.4% Imported 2026-05-11
367 Gemma 4 E2B (Non-reasoning) 20.4% Imported 2026-05-11
368 Kimi Linear 48B A3B Instruct 19.9% Imported 2026-05-11
369 Claude 2.0 19.4% Imported 2026-05-11
370 Gemini 2.5 Flash-Lite (Reasoning) 19.3% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
371 Sarvam 30B (high) 19.2% Imported 2026-05-11
372 Llama 3 Instruct 70B 18.9% Imported 2026-05-11
373 GLM-4.5V (Non-reasoning) 18.8% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
374 Jamba 1.7 Large 18.8% Imported 2026-05-11
375 Mixtral 8x22B Instruct 18.8% Mistral: Mixtral 8x22B Instruct
mistralai-mixtral-8x22b-instruct
Imported 2026-05-11
376 Claude 3 Haiku 18.6% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-11
377 Qwen3 Omni 30B A3B Instruct 18.6% Imported 2026-05-11
378 Claude 2.1 18.4% Imported 2026-05-11
379 Jamba 1.6 Large 18.4% Imported 2026-05-11
380 Qwen3.5 4B (Non-reasoning) 18.3% Imported 2026-05-11
381 Gemini 1.5 Flash (May '24) 18.1% Imported 2026-05-11
382 Qwen3 4B 2507 Instruct 18.1% Imported 2026-05-11
383 Sarvam M (Reasoning) 17.8% Imported 2026-05-11
384 Tri-21B-think Preview 17.8% Imported 2026-05-11
385 Gemini 2.5 Flash-Lite (Non-reasoning) 17.7% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
386 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 17.6% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
387 Gemma 3 12B Instruct 17.4% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
388 Qwen3 VL 8B Instruct 17.4% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
389 Tri-21B-Think 17.4% Imported 2026-05-11
390 Qwen3 VL 4B (Reasoning) 17.1% Imported 2026-05-11
391 Llama 4 Scout 17% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
392 Qwen3 8B (Non-reasoning) 16.8% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
393 Ring-flash-2.0 16.8% Imported 2026-05-11
394 Olmo 3.1 32B Instruct 16.7% OLMO Olmo 3.1 32B Instruct
allenai-olmo-3.1-32b-instruct
Imported 2026-05-11
395 Qwen3 4B (Non-reasoning) 16.7% Imported 2026-05-11
396 NVIDIA Nemotron 3 Nano 4B 16.4% Imported 2026-05-11
397 Solar Pro 2 (Preview) (Reasoning) 16.4% Imported 2026-05-11
398 Jamba 1.5 Large 16.3% Imported 2026-05-11
399 Qwen3.5 4B (Reasoning) 16.1% Imported 2026-05-11
400 Mistral Small (Sep '24) 15.6% Imported 2026-05-11
401 Qwen2.5 Turbo 15.3% Qwen-Turbo
qwen-qwen-turbo
Imported 2026-05-11
402 Qwen2.5 Coder Instruct 7B 14.8% Imported 2026-05-11
403 Ministral 3 3B 14.4% Imported 2026-05-11
404 DeepSeek Coder V2 Lite Instruct 13.9% Imported 2026-05-11
405 Nova Lite 13.9% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
406 Qwen3 VL 4B Instruct 13.7% Imported 2026-05-11
407 Ling-mini-2.0 13.5% Imported 2026-05-11
408 Mistral Small (Feb '24) 13.4% Imported 2026-05-11
409 Molmo2-8B 13.3% Imported 2026-05-11
410 Llama 3.1 Instruct 8B 13.2% Imported 2026-05-11
411 DeepSeek R1 Distill Llama 8B 11.9% Imported 2026-05-11
412 Granite 4.0 Micro 11.9% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
413 Granite 4.1 3B 11.9% Imported 2026-05-11
414 Llama 3 Instruct 8B 11.9% Imported 2026-05-11
415 Command-R+ (Apr '24) 11.8% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
416 DBRX Instruct 11.8% Imported 2026-05-11
417 Llama 2 Chat 13B 11.8% Imported 2026-05-11
418 Mistral Medium 11.8% Imported 2026-05-11
419 Gemini 1.0 Pro 11.7% Imported 2026-05-11
420 Llama 3.2 Instruct 11B (Vision) 11.2% Imported 2026-05-11
421 Phi-4 Multimodal Instruct 11% Imported 2026-05-11
422 LFM2 24B A2B 10.9% LFM LFM2-24B-A2B
liquid-lfm-2-24b-a2b
Imported 2026-05-11
423 Phi-4 Mini Instruct 10.8% Imported 2026-05-11
424 Olmo 3 7B Instruct 10.3% Imported 2026-05-11
425 Granite 3.3 8B (Non-reasoning) 10.1% Imported 2026-05-11
426 Jamba 1.6 Mini 10.1% Imported 2026-05-11
427 Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 10.1% Imported 2026-05-11
428 Nova Micro 9.4% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
429 Exaone 4.0 1.2B (Reasoning) 9.3% Imported 2026-05-11
430 Jamba 1.7 Mini 9.3% Imported 2026-05-11
431 DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) 9.1% Imported 2026-05-11
432 Phi-3 Mini Instruct 3.8B 9% Imported 2026-05-11
433 Granite 4.0 1B 8.7% Imported 2026-05-11
434 Gemma 3n E4B Instruct Preview (May '25) 8.6% Imported 2026-05-11
435 Granite 4.0 H 1B 8.2% Imported 2026-05-11
436 Gemma 3n E4B Instruct 8.1% Imported 2026-05-11
437 Jamba 1.5 Mini 8% Imported 2026-05-11
438 OLMo 2 32B 8% Imported 2026-05-11
439 Exaone 4.0 1.2B (Non-reasoning) 7.4% Imported 2026-05-11
440 Gemma 3 4B Instruct 7.3% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
441 Qwen3.5 2B (Non-reasoning) 7.2% Imported 2026-05-11
442 LFM 40B 7.1% Imported 2026-05-11
443 Qwen3 1.7B (Non-reasoning) 6.9% Imported 2026-05-11
444 LFM2 8B A1B 6.8% Imported 2026-05-11
445 DeepSeek R1 Distill Qwen 1.5B 6.6% Imported 2026-05-11
446 Command-R (Mar '24) 6.2% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
447 Jamba Reasoning 3B 5.9% Imported 2026-05-11
448 Apertus 70B Instruct 5.7% Imported 2026-05-11
449 Gemma 3n E2B Instruct 5.2% Imported 2026-05-11
450 Llama 3.2 Instruct 3B 5.2% Imported 2026-05-11
451 Qwen3 1.7B (Reasoning) 4.3% Imported 2026-05-11
452 LFM2.5-1.2B-Thinking 4.2% LFM LFM2.5-1.2B-Thinking
liquid-lfm-2.5-1.2b-thinking
Imported 2026-05-11
453 Apertus 8B Instruct 4.1% Imported 2026-05-11
454 Qwen3 0.6B (Non-reasoning) 4.1% Imported 2026-05-11
455 Gemma 4 E4B (Non-reasoning) 3.9% Imported 2026-05-11
456 QwQ 32B-Preview 3.8% Imported 2026-05-11
457 OLMo 2 7B 3.7% Imported 2026-05-11
458 Molmo 7B-D 3.6% Imported 2026-05-11
459 Tiny Aya Global 3.6% Imported 2026-05-11
460 Qwen3 4B (Reasoning) 3.5% Imported 2026-05-11
461 LFM2.5-VL-1.6B 3% Imported 2026-05-11
462 Qwen3.5 0.8B (Non-reasoning) 2.9% Imported 2026-05-11
463 Mixtral 8x7B Instruct 2.8% Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-11
464 Qwen3 0.6B (Reasoning) 2.8% Imported 2026-05-11
465 Qwen3.5 2B (Reasoning) 2.8% Imported 2026-05-11
466 LFM2 1.2B 2.5% Imported 2026-05-11
467 LFM2 2.6B 2.5% Imported 2026-05-11
468 Mistral 7B Instruct 2.4% Imported 2026-05-11
469 LFM2.5-1.2B-Instruct 2.3% LFM LFM2.5-1.2B-Instruct
liquid-lfm-2.5-1.2b-instruct
Imported 2026-05-11
470 MiniCPM-V 4.6 1.3B 2.1% Imported 2026-05-11
471 Granite 4.0 H 350M 1.7% Imported 2026-05-11
472 Llama 3.2 Instruct 1B 1.7% Imported 2026-05-11
473 Qwen3.6 35B A3B (Non-reasoning) 1.3% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
474 Granite 4.0 350M 0.9% Imported 2026-05-11
475 Gemma 3 1B Instruct 0.7% Imported 2026-05-11
476 Gemma 3 270M 0% Imported 2026-05-11
477 Llama 2 Chat 7B 0% Imported 2026-05-11
478 Qwen3.5 0.8B (Reasoning) 0% Imported 2026-05-11