MMLU-Pro

Enhanced MMLU benchmark with graduate-level questions across 14 subject areas and ten answer options.

351rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy

Showing 2 latest source slices.

Latest Results

Provider-published Qwen3.7-Max comparison scores. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Accuracy Model Match Provenance Sampled
1 Claude Opus 4.6 Max 89.7% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
2 Qwen3.7 Max 89.6% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
3 Qwen3.6 Plus 88.5% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
4 DeepSeek V4 Pro Max 87.5% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
5 Kimi K2.6 Thinking 87.1% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
6 GLM-5.1 Thinking 86.3% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
1 Gemini 3 Pro Preview (high) 89.8% Gemini 3
google-gemini-3
Imported 2026-05-11
2 Claude Opus 4.5 (Reasoning) 89.5% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
3 Gemini 3 Pro Preview (low) 89.5% Gemini 3
google-gemini-3
Imported 2026-05-11
4 Gemini 3 Flash Preview (Reasoning) 89% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
5 Claude Opus 4.5 (Non-reasoning) 88.9% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
6 Gemini 3 Flash Preview (Non-reasoning) 88.2% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
7 Claude 4.1 Opus (Reasoning) 88% Imported 2026-05-11
8 Claude 4.5 Sonnet (Reasoning) 87.5% Imported 2026-05-11
9 MiniMax-M2.1 87.5% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
10 GPT-5.2 (xhigh) 87.4% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
11 Claude 4 Opus (Reasoning) 87.3% Imported 2026-05-11
12 GPT-5 (high) 87.1% GPT-5
openai-gpt-5
Imported 2026-05-11
13 GPT-5.1 (high) 87% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
14 GPT-5 (medium) 86.7% GPT-5
openai-gpt-5
Imported 2026-05-11
15 Grok 4 86.6% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
16 GPT-5 Codex (high) 86.5% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
17 DeepSeek V3.2 Speciale 86.3% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
18 DeepSeek V3.2 (Reasoning) 86.2% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
19 Gemini 2.5 Pro 86.2% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
20 Claude 4 Opus (Non-reasoning) 86% Imported 2026-05-11
21 Claude 4.5 Sonnet (Non-reasoning) 86% Imported 2026-05-11
22 GPT-5 (low) 86% GPT-5
openai-gpt-5
Imported 2026-05-11
23 GPT-5.1 Codex (high) 86% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
24 GPT-5.2 (medium) 85.9% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
25 Gemini 2.5 Pro Preview (Mar' 25) 85.8% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
26 GLM-4.7 (Reasoning) 85.6% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
27 Doubao Seed Code 85.4% Imported 2026-05-11
28 Grok 4.1 Fast (Reasoning) 85.4% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
29 o3 85.3% o3
openai-o3
Imported 2026-05-11
30 DeepSeek V3.1 (Reasoning) 85.1% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
31 DeepSeek V3.1 Terminus (Reasoning) 85.1% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
32 DeepSeek V3.2 Exp (Reasoning) 85% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
33 Grok 4 Fast (Reasoning) 85% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
34 Cogito v2.1 (Reasoning) 84.9% Imported 2026-05-11
35 DeepSeek R1 0528 (May '25) 84.9% R1
deepseek-r1
Imported 2026-05-11
36 Kimi K2 Thinking 84.8% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
37 DeepSeek R1 (Jan '25) 84.4% R1
deepseek-r1
Imported 2026-05-11
38 MiMo-V2-Flash (Reasoning) 84.3% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
39 Qwen3 235B A22B 2507 (Reasoning) 84.3% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
40 Claude 4 Sonnet (Reasoning) 84.2% Imported 2026-05-11
41 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 84.2% Imported 2026-05-11
42 o1 84.1% o1
openai-o1
Imported 2026-05-11
43 Qwen3 Max 84.1% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
44 K-EXAONE (Reasoning) 83.8% Imported 2026-05-11
45 Qwen3 Max (Preview) 83.8% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
46 Claude 3.7 Sonnet (Reasoning) 83.7% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
47 Claude 4 Sonnet (Non-reasoning) 83.7% Imported 2026-05-11
48 DeepSeek V3.2 (Non-reasoning) 83.7% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
49 Gemini 2.5 Pro Preview (May' 25) 83.7% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
50 GPT-5 mini (high) 83.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
51 DeepSeek V3.1 Terminus (Non-reasoning) 83.6% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
52 DeepSeek V3.2 Exp (Non-reasoning) 83.6% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
53 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 83.6% Imported 2026-05-11
54 Qwen3 VL 235B A22B (Reasoning) 83.6% Imported 2026-05-11
55 GLM-4.5 (Reasoning) 83.5% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
56 DeepSeek V3.1 (Non-reasoning) 83.3% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
57 Gemini 2.5 Flash (Reasoning) 83.2% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
58 o4-mini (high) 83.2% o4 Mini
openai-o4-mini
Imported 2026-05-11
59 ERNIE 5.0 Thinking Preview 83% Imported 2026-05-11
60 Nova 2.0 Pro Preview (medium) 83% Imported 2026-05-11
61 GLM-4.6 (Reasoning) 82.9% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
62 Hermes 4 - Llama-3.1 405B (Reasoning) 82.9% Imported 2026-05-11
63 GPT-5 mini (medium) 82.8% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
64 Grok 3 mini Reasoning (high) 82.8% Imported 2026-05-11
65 Qwen3 235B A22B (Reasoning) 82.8% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
66 Qwen3 235B A22B 2507 Instruct 82.8% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
67 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 82.5% Imported 2026-05-11
68 Kimi K2 82.4% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
69 Qwen3 Max Thinking (Preview) 82.4% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
70 Qwen3 Next 80B A3B (Reasoning) 82.4% Imported 2026-05-11
71 Qwen3 VL 235B A22B Instruct 82.3% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
72 INTELLECT-3 82.2% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
73 Ling-1T 82.2% Imported 2026-05-11
74 Nova 2.0 Pro Preview (low) 82.2% Imported 2026-05-11
75 GPT-5 (ChatGPT) 82% GPT-5
openai-gpt-5
Imported 2026-05-11
76 GPT-5.1 Codex mini (high) 82% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
77 MiniMax-M2 82% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
78 DeepSeek V3 0324 81.9% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
79 Kimi K2 0905 81.9% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
80 Qwen3 Next 80B A3B Instruct 81.9% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
81 EXAONE 4.0 32B (Reasoning) 81.8% Imported 2026-05-11
82 Nova 2.0 Lite (high) 81.8% Imported 2026-05-11
83 Qwen3 VL 32B (Reasoning) 81.8% Imported 2026-05-11
84 MiniMax M1 80k 81.6% Imported 2026-05-11
85 GLM-4.5-Air 81.5% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
86 Magistral Medium 1.2 81.5% Imported 2026-05-11
87 Seed-OSS-36B-Instruct 81.5% Imported 2026-05-11
88 GPT-5.2 (Non-reasoning) 81.4% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
89 Llama Nemotron Super 49B v1.5 (Reasoning) 81.4% Imported 2026-05-11
90 KAT-Coder-Pro V1 81.3% Imported 2026-05-11
91 Mi:dm K 2.5 Pro Preview 81.3% Imported 2026-05-11
92 Nova 2.0 Lite (medium) 81.3% Imported 2026-05-11
93 Hermes 4 - Llama-3.1 70B (Reasoning) 81.1% Imported 2026-05-11
94 K-EXAONE (Non-reasoning) 81% Imported 2026-05-11
95 Gemini 2.5 Flash (Non-reasoning) 80.9% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
96 Llama 4 Maverick 80.9% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
97 Mi:dm K 2.5 Pro 80.9% Imported 2026-05-11
98 Nova 2.0 Omni (medium) 80.9% Imported 2026-05-11
99 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 80.8% Imported 2026-05-11
100 gpt-oss-120B (high) 80.8% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
101 MiniMax M1 40k 80.8% Imported 2026-05-11
102 Mistral Large 3 80.7% Imported 2026-05-11
103 Qwen3 VL 30B A3B (Reasoning) 80.7% Imported 2026-05-11
104 GPT-4.1 80.6% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
105 GPT-5 (minimal) 80.6% GPT-5
openai-gpt-5
Imported 2026-05-11
106 Ring-1T 80.6% Imported 2026-05-11
107 Gemini 2.0 Pro Experimental (Feb '25) 80.5% Imported 2026-05-11
108 Qwen3 30B A3B 2507 (Reasoning) 80.5% Imported 2026-05-11
109 Solar Pro 2 (Reasoning) 80.5% Imported 2026-05-11
110 Claude 3.7 Sonnet (Non-reasoning) 80.3% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
111 GPT-4o (March 2025, chatgpt-4o-latest) 80.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
112 o3-mini (high) 80.2% o3 Mini High
openai-o3-mini-high
Imported 2026-05-11
113 GPT-5.1 (Non-reasoning) 80.1% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
114 Claude 4.5 Haiku (Non-reasoning) 80% Imported 2026-05-11
115 Gemini 2.5 Flash Preview (Reasoning) 80% Imported 2026-05-11
116 GLM-4.6V (Reasoning) 79.9% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
117 Grok 3 79.9% GROK Grok 3
xaigrok-3
Imported 2026-05-11
118 Gemini 2.0 Flash Thinking Experimental (Jan '25) 79.8% Imported 2026-05-11
119 Nova 2.0 Omni (low) 79.8% Imported 2026-05-11
120 Qwen3 32B (Reasoning) 79.8% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
121 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 79.6% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
122 Motif-2-12.7B-Reasoning 79.6% Imported 2026-05-11
123 DeepSeek R1 Distill Llama 70B 79.5% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
124 GLM-4.7 (Non-reasoning) 79.4% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
125 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 79.4% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
126 Grok Code Fast 1 79.3% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
127 Ring-flash-2.0 79.3% Imported 2026-05-11
128 Qwen3 Omni 30B A3B (Reasoning) 79.2% Imported 2026-05-11
129 o3-mini 79.1% o3-mini
openai-o3-mini
Imported 2026-05-11
130 Qwen3 VL 32B Instruct 79.1% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
131 Apriel-v1.6-15B-Thinker 79% Imported 2026-05-11
132 GLM-4.5V (Reasoning) 78.8% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
133 Nova 2.0 Lite (low) 78.8% Imported 2026-05-11
134 Qwen3 Coder 480B A35B Instruct 78.8% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
135 K2-V2 (high) 78.6% Imported 2026-05-11
136 HyperCLOVA X SEED Think (32B) 78.5% Imported 2026-05-11
137 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 78.5% Imported 2026-05-11
138 GLM-4.6 (Non-reasoning) 78.4% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
139 Gemini 2.5 Flash Preview (Non-reasoning) 78.3% Imported 2026-05-11
140 Gemini 2.0 Flash (experimental) 78.2% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
141 GPT-4.1 mini 78.1% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
142 GPT-5 nano (high) 78% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
143 Gemini 2.0 Flash (Feb '25) 77.9% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
144 Ling-flash-2.0 77.7% Imported 2026-05-11
145 Qwen3 30B A3B (Reasoning) 77.7% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
146 Qwen3 30B A3B 2507 Instruct 77.7% Imported 2026-05-11
147 ERNIE 4.5 300B A47B 77.6% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
148 GPT-5 mini (minimal) 77.5% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
149 gpt-oss-120B (low) 77.5% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
150 Qwen3 14B (Reasoning) 77.4% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
151 Apriel-v1.5-15B-Thinker 77.3% Imported 2026-05-11
152 GPT-4o (ChatGPT) 77.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
153 Claude 3.5 Sonnet (Oct '24) 77.2% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
154 GPT-5 nano (medium) 77.2% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
155 Nova 2.0 Pro Preview (Non-reasoning) 77.2% Imported 2026-05-11
156 EXAONE 4.0 32B (Non-reasoning) 76.8% Imported 2026-05-11
157 Magistral Small 1.2 76.8% Imported 2026-05-11
158 Solar Pro 2 (Preview) (Reasoning) 76.8% Imported 2026-05-11
159 Qwen3 VL 30B A3B Instruct 76.4% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
160 QwQ 32B 76.4% Imported 2026-05-11
161 Olmo 3.1 32B Think 76.3% Imported 2026-05-11
162 Devstral 2 76.2% Imported 2026-05-11
163 Qwen2.5 Max 76.2% Imported 2026-05-11
164 Qwen3 235B A22B (Non-reasoning) 76.2% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
165 K2-V2 (medium) 76.1% Imported 2026-05-11
166 Claude 4.5 Haiku (Reasoning) 76% Imported 2026-05-11
167 Mistral Medium 3 76% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
168 Gemini 2.5 Flash-Lite (Reasoning) 75.9% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
169 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 75.9% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
170 Olmo 3 32B Think 75.9% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
171 Sonar Pro 75.5% Sonar Pro
perplexity-sonar-pro
Imported 2026-05-11
172 Magistral Medium 1 75.3% Imported 2026-05-11
173 DeepSeek V3 (Dec '24) 75.2% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
174 GLM-4.6V (Non-reasoning) 75.2% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
175 Llama 4 Scout 75.2% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
176 Claude 3.5 Sonnet (June '24) 75.1% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
177 GLM-4.5V (Non-reasoning) 75.1% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
178 Gemini 1.5 Pro (Sep '24) 75% Imported 2026-05-11
179 Solar Pro 2 (Non-reasoning) 75% Imported 2026-05-11
180 Qwen3 VL 8B (Reasoning) 74.9% Imported 2026-05-11
181 GPT-4o (Nov '24) 74.8% GPT-4o
openai-gpt-4o
Imported 2026-05-11
182 gpt-oss-20B (high) 74.8% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
183 Magistral Small 1 74.6% Imported 2026-05-11
184 MiMo-V2-Flash (Non-reasoning) 74.4% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
185 Grok 4.1 Fast (Non-reasoning) 74.3% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
186 Nova 2.0 Lite (Non-reasoning) 74.3% Imported 2026-05-11
187 Qwen3 4B 2507 (Reasoning) 74.3% Imported 2026-05-11
188 Qwen3 8B (Reasoning) 74.3% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
189 NVIDIA Nemotron Nano 9B V2 (Reasoning) 74.2% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
190 o1-mini 74.2% Imported 2026-05-11
191 DeepSeek R1 Distill Qwen 14B 74% Imported 2026-05-11
192 GPT-4o (May '24) 74% GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-11
193 DeepSeek R1 0528 Qwen3 8B 73.9% Imported 2026-05-11
194 DeepSeek R1 Distill Qwen 32B 73.9% R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-11
195 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 73.9% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
196 Nova Premier 73.3% Imported 2026-05-11
197 Llama 3.1 Instruct 405B 73.2% Imported 2026-05-11
198 Grok 4 Fast (Non-reasoning) 73% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
199 Hermes 4 - Llama-3.1 405B (Non-reasoning) 72.9% Imported 2026-05-11
200 Qwen3 32B (Non-reasoning) 72.7% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
201 Falcon-H1R-7B 72.5% Imported 2026-05-11
202 Qwen3 Omni 30B A3B Instruct 72.5% Imported 2026-05-11
203 Solar Pro 2 (Preview) (Non-reasoning) 72.5% Imported 2026-05-11
204 Gemini 2.0 Flash-Lite (Feb '25) 72.4% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
205 Gemini 2.5 Flash-Lite (Non-reasoning) 72.4% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
206 Qwen2.5 Instruct 72B 72% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
207 Nova 2.0 Omni (Non-reasoning) 71.9% Imported 2026-05-11
208 gpt-oss-20B (low) 71.8% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
209 Llama 3.1 Tulu3 405B 71.6% Imported 2026-05-11
210 Phi-4 71.4% Phi 4
microsoft-phi-4
Imported 2026-05-11
211 K2-V2 (low) 71.3% Imported 2026-05-11
212 Llama 3.3 Instruct 70B 71.3% Imported 2026-05-11
213 Command A 71.2% C Command A
cohere-command-a
Imported 2026-05-11
214 Qwen3 30B A3B (Non-reasoning) 71% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
215 Grok 2 (Dec '24) 70.9% Imported 2026-05-11
216 Devstral Medium 70.8% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
217 Qwen3 Coder 30B A3B Instruct 70.6% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
218 Grok Beta 70.3% Imported 2026-05-11
219 Pixtral Large 70.1% Mistral: Pixtral Large 2411
mistralai-pixtral-large-2411
Imported 2026-05-11
220 Qwen3 VL 4B (Reasoning) 70% Imported 2026-05-11
221 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 69.8% Imported 2026-05-11
222 Mistral Large 2 (Nov '24) 69.7% Imported 2026-05-11
223 Qwen2.5 Instruct 32B 69.7% Imported 2026-05-11
224 Claude 3 Opus 69.6% Imported 2026-05-11
225 Qwen3 4B (Reasoning) 69.6% Imported 2026-05-11
226 Sarvam M (Reasoning) 69.6% Imported 2026-05-11
227 GPT-4 Turbo 69.4% GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-11
228 Ministral 3 14B 69.3% Imported 2026-05-11
229 Llama Nemotron Super 49B v1.5 (Non-reasoning) 69.2% Imported 2026-05-11
230 Nova Pro 69.1% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
231 Llama 3.1 Nemotron Instruct 70B 69% Imported 2026-05-11
232 Sonar 68.9% Sonar
perplexity-sonar
Imported 2026-05-11
233 Qwen3 VL 8B Instruct 68.6% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
234 Mistral Large 2 (Jul '24) 68.3% Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-11
235 Mistral Medium 3.1 68.3% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
236 Mistral Small 3.2 68.1% Imported 2026-05-11
237 Gemini 1.5 Flash (Sep '24) 68% Imported 2026-05-11
238 Devstral Small 2 67.8% Imported 2026-05-11
239 Llama 3.1 Instruct 70B 67.6% Imported 2026-05-11
240 Qwen3 14B (Non-reasoning) 67.5% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
241 Qwen3 4B 2507 Instruct 67.2% Imported 2026-05-11
242 Ling-mini-2.0 67.1% Imported 2026-05-11
243 Llama 3.2 Instruct 90B (Vision) 67.1% Imported 2026-05-11
244 Gemma 3 27B Instruct 66.9% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
245 Reka Flash 3 66.9% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
246 Hermes 4 - Llama-3.1 70B (Non-reasoning) 66.4% Imported 2026-05-11
247 Mistral Small 3.1 65.9% Imported 2026-05-11
248 Gemini 1.5 Pro (May '24) 65.7% Imported 2026-05-11
249 GPT-4.1 nano 65.7% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
250 Olmo 3 7B Think 65.5% Imported 2026-05-11
251 Mistral Small 3 65.2% Imported 2026-05-11
252 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 64.9% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
253 GPT-4o mini 64.8% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-11
254 QwQ 32B-Preview 64.8% Imported 2026-05-11
255 Qwen3 8B (Non-reasoning) 64.3% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
256 Ministral 3 8B 64.2% Imported 2026-05-11
257 Qwen2.5 Coder Instruct 32B 63.5% Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-11
258 Claude 3.5 Haiku 63.4% Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-11
259 Qwen3 VL 4B Instruct 63.4% Imported 2026-05-11
260 Qwen2.5 Turbo 63.3% Qwen-Turbo
qwen-qwen-turbo
Imported 2026-05-11
261 Devstral Small (May '25) 63.2% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
262 Granite 4.0 H Small 62.4% Imported 2026-05-11
263 Devstral Small (Jul '25) 62.2% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
264 Qwen2 Instruct 72B 62.2% Imported 2026-05-11
265 Mistral Saba 61.1% Mistral: Saba
mistralai-mistral-saba
Imported 2026-05-11
266 Gemma 3 12B Instruct 59.5% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
267 Nova Lite 59% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
268 Exaone 4.0 1.2B (Reasoning) 58.8% Imported 2026-05-11
269 Qwen3 4B (Non-reasoning) 58.6% Imported 2026-05-11
270 Kimi Linear 48B A3B Instruct 58.5% Imported 2026-05-11
271 DeepHermes 3 - Mistral 24B Preview (Non-reasoning) 58% Imported 2026-05-11
272 Claude 3 Sonnet 57.9% Imported 2026-05-11
273 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 57.9% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
274 Jamba 1.7 Large 57.7% Imported 2026-05-11
275 Jamba Reasoning 3B 57.7% Imported 2026-05-11
276 Gemini 1.5 Flash (May '24) 57.4% Imported 2026-05-11
277 Llama 3 Instruct 70B 57.4% Imported 2026-05-11
278 Jamba 1.5 Large 57.2% Imported 2026-05-11
279 Hermes 3 - Llama-3.1 70B 57.1% L Hermes 3 70B Instruct
nousresearch-hermes-3-llama-3.1-70b
Imported 2026-05-11
280 Qwen3 1.7B (Reasoning) 57% Imported 2026-05-11
281 Gemini 1.5 Flash-8B 56.9% Imported 2026-05-11
282 Jamba 1.6 Large 56.5% Imported 2026-05-11
283 GPT-5 nano (minimal) 55.6% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
284 Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 55.6% Imported 2026-05-11
285 DeepSeek R1 Distill Llama 8B 54.3% Imported 2026-05-11
286 Mixtral 8x22B Instruct 53.7% Mistral: Mixtral 8x22B Instruct
mistralai-mixtral-8x22b-instruct
Imported 2026-05-11
287 Nova Micro 53.1% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
288 Mistral Small (Sep '24) 52.9% Imported 2026-05-11
289 Ministral 3 3B 52.4% Imported 2026-05-11
290 Olmo 3 7B Instruct 52.2% Imported 2026-05-11
291 Mistral Large (Feb '24) 51.5% Mistral Large
mistralai-mistral-large
Imported 2026-05-11
292 OLMo 2 32B 51.1% Imported 2026-05-11
293 LFM2 8B A1B 50.5% Imported 2026-05-11
294 Exaone 4.0 1.2B (Non-reasoning) 50% Imported 2026-05-11
295 Claude 2.1 49.5% Imported 2026-05-11
296 Mistral Medium 49.1% Imported 2026-05-11
297 Gemma 3n E4B Instruct 48.8% Imported 2026-05-11
298 Claude 2.0 48.6% Imported 2026-05-11
299 Phi-4 Multimodal Instruct 48.5% Imported 2026-05-11
300 Gemma 3n E4B Instruct Preview (May '25) 48.3% Imported 2026-05-11
301 Llama 3.1 Instruct 8B 47.6% Imported 2026-05-11
302 Qwen2.5 Coder Instruct 7B 47.3% Imported 2026-05-11
303 Granite 3.3 8B (Non-reasoning) 46.8% Imported 2026-05-11
304 Phi-4 Mini Instruct 46.5% Imported 2026-05-11
305 Llama 3.2 Instruct 11B (Vision) 46.4% Imported 2026-05-11
306 GPT-3.5 Turbo 46.2% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-11
307 Granite 4.0 Micro 44.7% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
308 Phi-3 Mini Instruct 3.8B 43.5% Imported 2026-05-11
309 Claude Instant 43.4% Imported 2026-05-11
310 Command-R+ (Apr '24) 43.2% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
311 Gemini 1.0 Pro 43.1% Imported 2026-05-11
312 DeepSeek Coder V2 Lite Instruct 42.9% Imported 2026-05-11
313 LFM 40B 42.5% Imported 2026-05-11
314 Mistral Small (Feb '24) 41.9% Imported 2026-05-11
315 Gemma 3 4B Instruct 41.7% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
316 Qwen3 1.7B (Non-reasoning) 41.1% Imported 2026-05-11
317 Llama 2 Chat 13B 40.6% Imported 2026-05-11
318 Llama 2 Chat 70B 40.6% Imported 2026-05-11
319 Llama 3 Instruct 8B 40.5% Imported 2026-05-11
320 DBRX Instruct 39.7% Imported 2026-05-11
321 Jamba 1.7 Mini 38.8% Imported 2026-05-11
322 Mixtral 8x7B Instruct 38.7% Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-11
323 Gemma 3n E2B Instruct 37.8% Imported 2026-05-11
324 Jamba 1.5 Mini 37.1% Imported 2026-05-11
325 Molmo 7B-D 37.1% Imported 2026-05-11
326 Jamba 1.6 Mini 36.7% Imported 2026-05-11
327 DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) 36.5% Imported 2026-05-11
328 Llama 3.2 Instruct 3B 34.7% Imported 2026-05-11
329 Qwen3 0.6B (Reasoning) 34.7% Imported 2026-05-11
330 Command-R (Mar '24) 33.8% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
331 Granite 4.0 1B 32.5% Imported 2026-05-11
332 OpenChat 3.5 (1210) 31% Imported 2026-05-11
333 LFM2 2.6B 29.8% Imported 2026-05-11
334 OLMo 2 7B 28.2% Imported 2026-05-11
335 Granite 4.0 H 1B 27.7% Imported 2026-05-11
336 DeepSeek R1 Distill Qwen 1.5B 26.9% Imported 2026-05-11
337 LFM2 1.2B 25.7% Imported 2026-05-11
338 Mistral 7B Instruct 24.5% Imported 2026-05-11
339 Qwen3 0.6B (Non-reasoning) 23.1% Imported 2026-05-11
340 Llama 3.2 Instruct 1B 20% Imported 2026-05-11
341 Llama 2 Chat 7B 16.4% Imported 2026-05-11
342 Gemma 3 1B Instruct 13.5% Imported 2026-05-11
343 Granite 4.0 H 350M 12.7% Imported 2026-05-11
344 Granite 4.0 350M 12.4% Imported 2026-05-11
345 Gemma 3 270M 5.5% Imported 2026-05-11