AIME 2025

All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning.

269rows
scoreprimary metric
2026-05-11sampled

Metadata

Metrics

Accuracy

Latest Results

Rows are parsed from the public Artificial Analysis Next.js RSC defaultData payload and ranked by the configured primary metric.

Rank Subject Accuracy Model Match Provenance Sampled
1 GPT-5.2 (xhigh) 99% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
2 GPT-5 Codex (high) 98.7% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
3 Gemini 3 Flash Preview (Reasoning) 97% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
4 DeepSeek V3.2 Speciale 96.7% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
5 GPT-5.2 (medium) 96.7% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
6 MiMo-V2-Flash (Reasoning) 96.3% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
7 Gemini 3 Pro Preview (high) 95.7% Gemini 3
google-gemini-3
Imported 2026-05-11
8 GPT-5.1 Codex (high) 95.7% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
9 GLM-4.7 (Reasoning) 95% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
10 KAT-Coder-Pro V1 94.7% Imported 2026-05-11
11 Kimi K2 Thinking 94.7% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
12 GPT-5 (high) 94.3% GPT-5
openai-gpt-5
Imported 2026-05-11
13 Nova 2.0 Lite (high) 94.3% Imported 2026-05-11
14 GPT-5.1 (high) 94% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
15 gpt-oss-120B (high) 93.4% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
16 Grok 4 92.7% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
17 DeepSeek V3.2 (Reasoning) 92% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
18 GPT-5 (medium) 91.7% GPT-5
openai-gpt-5
Imported 2026-05-11
19 GPT-5.1 Codex mini (high) 91.7% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
20 Claude Opus 4.5 (Reasoning) 91.3% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
21 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 91% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
22 Qwen3 235B A22B 2507 (Reasoning) 91% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
23 GPT-5 mini (high) 90.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
24 o4-mini (high) 90.7% o4 Mini
openai-o4-mini
Imported 2026-05-11
25 K-EXAONE (Reasoning) 90.3% Imported 2026-05-11
26 DeepSeek V3.1 (Reasoning) 89.7% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
27 DeepSeek V3.1 Terminus (Reasoning) 89.7% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
28 Grok 4 Fast (Reasoning) 89.7% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
29 Nova 2.0 Omni (medium) 89.7% Imported 2026-05-11
30 gpt-oss-20B (high) 89.3% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
31 Grok 4.1 Fast (Reasoning) 89.3% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
32 Ring-1T 89.3% Imported 2026-05-11
33 Nova 2.0 Pro Preview (medium) 89% Imported 2026-05-11
34 Nova 2.0 Lite (medium) 88.7% Imported 2026-05-11
35 o3 88.3% o3
openai-o3
Imported 2026-05-11
36 Qwen3 VL 235B A22B (Reasoning) 88.3% Imported 2026-05-11
37 Apriel-v1.6-15B-Thinker 88% Imported 2026-05-11
38 Claude 4.5 Sonnet (Reasoning) 88% Imported 2026-05-11
39 INTELLECT-3 88% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
40 DeepSeek V3.2 Exp (Reasoning) 87.7% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
41 Gemini 2.5 Pro 87.7% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
42 Apriel-v1.5-15B-Thinker 87.5% Imported 2026-05-11
43 Gemini 3 Pro Preview (low) 86.7% Gemini 3
google-gemini-3
Imported 2026-05-11
44 GLM-4.6 (Reasoning) 86% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
45 GLM-4.6V (Reasoning) 85.3% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
46 ERNIE 5.0 Thinking Preview 85% Imported 2026-05-11
47 GPT-5 mini (medium) 85% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
48 Grok 3 mini Reasoning (high) 84.7% Imported 2026-05-11
49 Qwen3 VL 32B (Reasoning) 84.7% Imported 2026-05-11
50 Seed-OSS-36B-Instruct 84.7% Imported 2026-05-11
51 Qwen3 Next 80B A3B (Reasoning) 84.3% Imported 2026-05-11
52 Claude 4.5 Haiku (Reasoning) 83.7% Imported 2026-05-11
53 GPT-5 nano (high) 83.7% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
54 Ring-flash-2.0 83.7% Imported 2026-05-11
55 GPT-5 (low) 83% GPT-5
openai-gpt-5
Imported 2026-05-11
56 MiniMax-M2.1 82.7% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
57 Qwen3 4B 2507 (Reasoning) 82.7% Imported 2026-05-11
58 Qwen3 Max Thinking (Preview) 82.3% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
59 Qwen3 VL 30B A3B (Reasoning) 82.3% Imported 2026-05-11
60 Magistral Medium 1.2 82% Imported 2026-05-11
61 Qwen3 235B A22B (Reasoning) 82% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
62 GLM-4.5-Air 80.7% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
63 Qwen3 Max 80.7% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
64 Claude 4.1 Opus (Reasoning) 80.3% Imported 2026-05-11
65 Magistral Small 1.2 80.3% Imported 2026-05-11
66 Motif-2-12.7B-Reasoning 80.3% Imported 2026-05-11
67 EXAONE 4.0 32B (Reasoning) 80% Imported 2026-05-11
68 Falcon-H1R-7B 80% Imported 2026-05-11
69 Doubao Seed Code 79.3% Imported 2026-05-11
70 Mi:dm K 2.5 Pro Preview 78.7% Imported 2026-05-11
71 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 78.3% Imported 2026-05-11
72 GPT-5 nano (medium) 78.3% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
73 K2-V2 (high) 78.3% Imported 2026-05-11
74 MiniMax-M2 78.3% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
75 Olmo 3.1 32B Think 77.3% Imported 2026-05-11
76 Llama Nemotron Super 49B v1.5 (Reasoning) 76.7% Imported 2026-05-11
77 Mi:dm K 2.5 Pro 76.7% Imported 2026-05-11
78 DeepSeek R1 0528 (May '25) 76% R1
deepseek-r1
Imported 2026-05-11
79 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 75% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
80 Qwen3 Max (Preview) 75% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
81 Claude 4 Sonnet (Reasoning) 74.3% Imported 2026-05-11
82 Qwen3 Omni 30B A3B (Reasoning) 74% Imported 2026-05-11
83 GLM-4.5 (Reasoning) 73.7% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
84 Olmo 3 32B Think 73.7% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
85 Claude 4 Opus (Reasoning) 73.3% Imported 2026-05-11
86 Gemini 2.5 Flash (Reasoning) 73.3% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
87 GLM-4.5V (Reasoning) 73% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
88 Qwen3 32B (Reasoning) 73% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
89 Cogito v2.1 (Reasoning) 72.7% Imported 2026-05-11
90 Qwen3 30B A3B (Reasoning) 72.3% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
91 Qwen3 VL 30B A3B Instruct 72.3% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
92 Qwen3 235B A22B 2507 Instruct 71.7% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
93 Ling-1T 71.3% Imported 2026-05-11
94 Olmo 3 7B Think 70.7% Imported 2026-05-11
95 Qwen3 VL 235B A22B Instruct 70.7% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
96 Hermes 4 - Llama-3.1 405B (Reasoning) 69.7% Imported 2026-05-11
97 NVIDIA Nemotron Nano 9B V2 (Reasoning) 69.7% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
98 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 68.7% Imported 2026-05-11
99 Hermes 4 - Llama-3.1 70B (Reasoning) 68.7% Imported 2026-05-11
100 Qwen3 VL 32B Instruct 68.3% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
101 DeepSeek R1 (Jan '25) 68% R1
deepseek-r1
Imported 2026-05-11
102 MiMo-V2-Flash (Non-reasoning) 67.7% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
103 gpt-oss-120B (low) 66.7% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
104 Qwen3 30B A3B 2507 Instruct 66.3% Imported 2026-05-11
105 Qwen3 Next 80B A3B Instruct 66.3% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
106 Ling-flash-2.0 65.3% Imported 2026-05-11
107 K2-V2 (medium) 64.7% Imported 2026-05-11
108 DeepSeek R1 0528 Qwen3 8B 63.7% Imported 2026-05-11
109 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 63.7% Imported 2026-05-11
110 Nova 2.0 Pro Preview (low) 63.3% Imported 2026-05-11
111 DeepSeek R1 Distill Qwen 32B 63% R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-11
112 Claude Opus 4.5 (Non-reasoning) 62.7% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
113 gpt-oss-20B (low) 62.3% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
114 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 62.3% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
115 Solar Pro 2 (Reasoning) 61.3% Imported 2026-05-11
116 MiniMax M1 80k 61% Imported 2026-05-11
117 Gemini 2.5 Flash (Non-reasoning) 60.3% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
118 DeepSeek V3.2 (Non-reasoning) 59% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
119 HyperCLOVA X SEED Think (32B) 59% Imported 2026-05-11
120 Grok 3 58% GROK Grok 3
xaigrok-3
Imported 2026-05-11
121 Qwen3 14B (Non-reasoning) 58% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
122 DeepSeek V3.2 Exp (Non-reasoning) 57.7% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
123 Kimi K2 0905 57.3% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
124 Kimi K2 57% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
125 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 56.7% Imported 2026-05-11
126 Claude 3.7 Sonnet (Reasoning) 56.3% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
127 Qwen3 30B A3B 2507 (Reasoning) 56.3% Imported 2026-05-11
128 Nova 2.0 Omni (low) 56% Imported 2026-05-11
129 DeepSeek R1 Distill Qwen 14B 55.7% Imported 2026-05-11
130 Gemini 3 Flash Preview (Non-reasoning) 55.7% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
131 Qwen3 14B (Reasoning) 55.7% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
132 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 54.7% Imported 2026-05-11
133 DeepSeek R1 Distill Llama 70B 53.7% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
134 DeepSeek V3.1 Terminus (Non-reasoning) 53.7% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
135 Gemini 2.5 Flash-Lite (Reasoning) 53.3% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
136 Qwen3 4B 2507 Instruct 52.3% Imported 2026-05-11
137 Qwen3 Omni 30B A3B Instruct 52.3% Imported 2026-05-11
138 GPT-5.2 (Non-reasoning) 51% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
139 Exaone 4.0 1.2B (Reasoning) 50.3% Imported 2026-05-11
140 Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 50% Imported 2026-05-11
141 DeepSeek V3.1 (Non-reasoning) 49.7% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
142 Ling-mini-2.0 49.3% Imported 2026-05-11
143 GPT-5 (ChatGPT) 48.3% GPT-5
openai-gpt-5
Imported 2026-05-11
144 GLM-4.7 (Non-reasoning) 48% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
145 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 46.7% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
146 GPT-5 mini (minimal) 46.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
147 Nova 2.0 Lite (low) 46.7% Imported 2026-05-11
148 GPT-4.1 mini 46.3% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
149 GLM-4.6 (Non-reasoning) 44.3% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
150 K-EXAONE (Non-reasoning) 44% Imported 2026-05-11
151 Grok Code Fast 1 43.3% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
152 DeepSeek R1 Distill Llama 8B 41.3% Imported 2026-05-11
153 ERNIE 4.5 300B A47B 41.3% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
154 Grok 4 Fast (Non-reasoning) 41.3% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
155 Magistral Small 1 41.3% Imported 2026-05-11
156 Olmo 3 7B Instruct 41.3% Imported 2026-05-11
157 DeepSeek V3 0324 41% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
158 Magistral Medium 1 40.3% Imported 2026-05-11
159 EXAONE 4.0 32B (Non-reasoning) 39.3% Imported 2026-05-11
160 Qwen3 Coder 480B A35B Instruct 39.3% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
161 Claude 4.5 Haiku (Non-reasoning) 39% Imported 2026-05-11
162 Qwen3 1.7B (Reasoning) 38.7% Imported 2026-05-11
163 Mistral Medium 3.1 38.3% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
164 Claude 4 Sonnet (Non-reasoning) 38% Imported 2026-05-11
165 GPT-5.1 (Non-reasoning) 38% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
166 Mistral Large 3 38% Imported 2026-05-11
167 Claude 4.5 Sonnet (Non-reasoning) 37% Imported 2026-05-11
168 Nova 2.0 Omni (Non-reasoning) 37% Imported 2026-05-11
169 Qwen3 VL 4B Instruct 37% Imported 2026-05-11
170 Devstral 2 36.7% Imported 2026-05-11
171 Claude 4 Opus (Non-reasoning) 36.3% Imported 2026-05-11
172 Kimi Linear 48B A3B Instruct 36.3% Imported 2026-05-11
173 Gemini 2.5 Flash-Lite (Non-reasoning) 35.3% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
174 K2-V2 (low) 35.3% Imported 2026-05-11
175 GPT-4.1 34.7% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
176 Devstral Small 2 34.3% Imported 2026-05-11
177 Grok 4.1 Fast (Non-reasoning) 34.3% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
178 Nova 2.0 Lite (Non-reasoning) 33.7% Imported 2026-05-11
179 Reka Flash 3 33.7% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
180 GPT-5 (minimal) 31.7% GPT-5
openai-gpt-5
Imported 2026-05-11
181 Ministral 3 8B 31.7% Imported 2026-05-11
182 Nova 2.0 Pro Preview (Non-reasoning) 30.7% Imported 2026-05-11
183 Qwen3 VL 8B (Reasoning) 30.7% Imported 2026-05-11
184 Mistral Medium 3 30.3% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
185 Ministral 3 14B 30% Imported 2026-05-11
186 Solar Pro 2 (Non-reasoning) 30% Imported 2026-05-11
187 Devstral Small (Jul '25) 29.3% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
188 Qwen3 Coder 30B A3B Instruct 29% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
189 QwQ 32B 29% Imported 2026-05-11
190 GPT-5 nano (minimal) 27.3% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
191 Qwen3 VL 8B Instruct 27.3% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
192 Mistral Small 3.2 27% Imported 2026-05-11
193 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 26.7% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
194 GLM-4.6V (Non-reasoning) 26.3% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
195 DeepSeek V3 (Dec '24) 26% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
196 GPT-4o (March 2025, chatgpt-4o-latest) 25.7% GPT-4o
openai-gpt-4o
Imported 2026-05-11
197 Qwen3 VL 4B (Reasoning) 25.7% Imported 2026-05-11
198 LFM2 8B A1B 25.3% Imported 2026-05-11
199 Qwen3 8B (Non-reasoning) 24.3% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
200 Exaone 4.0 1.2B (Non-reasoning) 24% Imported 2026-05-11
201 GPT-4.1 nano 24% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
202 Qwen3 235B A22B (Non-reasoning) 23.7% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
203 Qwen3 4B (Reasoning) 22.3% Imported 2026-05-11
204 DeepSeek R1 Distill Qwen 1.5B 22% Imported 2026-05-11
205 Ministral 3 3B 22% Imported 2026-05-11
206 Gemini 2.0 Flash (Feb '25) 21.7% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
207 Qwen3 30B A3B (Non-reasoning) 21.7% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
208 Claude 3.7 Sonnet (Non-reasoning) 21% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
209 Gemma 3 27B Instruct 20.7% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
210 Qwen3 32B (Non-reasoning) 19.7% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
211 Llama 4 Maverick 19.3% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
212 Qwen3 8B (Reasoning) 19% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
213 Gemma 3 12B Instruct 18.3% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
214 Phi-4 18% Phi 4
microsoft-phi-4
Imported 2026-05-11
215 Qwen3 0.6B (Reasoning) 18% Imported 2026-05-11
216 Nova Premier 17.3% Imported 2026-05-11
217 GLM-4.5V (Non-reasoning) 15.3% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
218 Hermes 4 - Llama-3.1 405B (Non-reasoning) 15.3% Imported 2026-05-11
219 GPT-4o mini 14.7% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-11
220 Gemma 3n E4B Instruct 14.3% Imported 2026-05-11
221 Llama 4 Scout 14% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
222 Mistral Large 2 (Nov '24) 14% Imported 2026-05-11
223 Qwen2.5 Instruct 72B 14% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
224 Granite 4.0 H Small 13.7% Imported 2026-05-11
225 MiniMax M1 40k 13.7% Imported 2026-05-11
226 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 13.3% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
227 Command A 13% C Command A
cohere-command-a
Imported 2026-05-11
228 Gemma 3 4B Instruct 12.7% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
229 Hermes 4 - Llama-3.1 70B (Non-reasoning) 11.3% Imported 2026-05-11
230 Llama 3.1 Nemotron Instruct 70B 11% Imported 2026-05-11
231 Jamba Reasoning 3B 10.7% Imported 2026-05-11
232 Gemma 3n E2B Instruct 10.3% Imported 2026-05-11
233 Qwen3 0.6B (Non-reasoning) 10.3% Imported 2026-05-11
234 LFM2 2.6B 8.3% Imported 2026-05-11
235 Llama Nemotron Super 49B v1.5 (Non-reasoning) 8% Imported 2026-05-11
236 Llama 3.3 Instruct 70B 7.7% Imported 2026-05-11
237 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 7.7% Imported 2026-05-11
238 Qwen3 1.7B (Non-reasoning) 7.3% Imported 2026-05-11
239 Nova Lite 7% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
240 Nova Pro 7% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
241 Granite 3.3 8B (Non-reasoning) 6.7% Imported 2026-05-11
242 Phi-4 Mini Instruct 6.7% Imported 2026-05-11
243 Granite 4.0 1B 6.3% Imported 2026-05-11
244 Granite 4.0 H 1B 6.3% Imported 2026-05-11
245 GPT-4o (Nov '24) 6% GPT-4o
openai-gpt-4o
Imported 2026-05-11
246 Granite 4.0 Micro 6% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
247 Nova Micro 6% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
248 Devstral Medium 4.7% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
249 Llama 3.1 Instruct 8B 4.3% Imported 2026-05-11
250 Mistral Small 3 4.3% Imported 2026-05-11
251 Llama 3.1 Instruct 70B 4% Imported 2026-05-11
252 Mistral Small 3.1 3.7% Imported 2026-05-11
253 Gemma 3 1B Instruct 3.3% Imported 2026-05-11
254 LFM2 1.2B 3.3% Imported 2026-05-11
255 Llama 3.2 Instruct 3B 3.3% Imported 2026-05-11
256 OLMo 2 32B 3.3% Imported 2026-05-11
257 Llama 3.1 Instruct 405B 3% Imported 2026-05-11
258 Gemma 3 270M 2.3% Imported 2026-05-11
259 Jamba 1.7 Large 2.3% Imported 2026-05-11
260 Pixtral Large 2.3% Mistral: Pixtral Large 2411
mistralai-pixtral-large-2411
Imported 2026-05-11
261 Llama 3.2 Instruct 11B (Vision) 1.7% Imported 2026-05-11
262 Granite 4.0 H 350M 1.3% Imported 2026-05-11
263 OLMo 2 7B 0.7% Imported 2026-05-11
264 Jamba 1.7 Mini 0.3% Imported 2026-05-11
265 Phi-3 Mini Instruct 3.8B 0.3% Imported 2026-05-11
266 Granite 4.0 350M 0% Imported 2026-05-11
267 Llama 3.2 Instruct 1B 0% Imported 2026-05-11
268 Mistral Large 2 (Jul '24) 0% Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-11
269 Molmo 7B-D 0% Imported 2026-05-11