Humanity's Last Exam

Frontier-level benchmark with expert-vetted closed-ended questions across mathematics, sciences, and humanities.

501rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy

Showing 5 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Accuracy Model Match Provenance Sampled
1 Claude Opus 4.8 57.9% Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
2 Claude Opus 4.7 54.7% Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
3 GPT-5.5 52.2% GPT-5.5
openai-gpt-5.5
Self-reported 2026-05-28
4 Gemini 3.1 Pro Preview 51.4% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-28
1 Qwen3.7 Max 41.4% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 Claude Opus 4.6 Max 40% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
3 DeepSeek V4 Pro Max 37.7% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
4 Kimi K2.6 Thinking 36.4% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
5 GLM-5.1 Thinking 34.7% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
6 Qwen3.6 Plus 28.8% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
1 Gemini 3.1 Pro Preview 44.7% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
2 GPT-5.5 (xhigh) 44.3% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
3 GPT-5.5 (high) 43% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
4 GPT-5.4 (xhigh) 41.6% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
5 GPT-5.5 (medium) 40.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
6 GPT-5.3 Codex (xhigh) 39.9% GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-11
7 Muse Spark 39.9% Imported 2026-05-11
8 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 39.6% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
9 Gemini 3 Pro Preview (high) 37.2% Gemini 3
google-gemini-3
Imported 2026-05-11
10 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 36.7% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
11 DeepSeek V4 Pro (Reasoning, Max Effort) 35.9% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
12 Kimi K2.6 35.9% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
13 GPT-5.2 (xhigh) 35.4% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
14 Grok 4.3 35% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
15 Gemini 3 Flash Preview (Reasoning) 34.7% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
16 MiMo-V2.5-Pro 33.8% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
17 DeepSeek V4 Pro (Reasoning, High Effort) 33.5% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
18 GPT-5.2 Codex (xhigh) 33.5% GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-11
19 KAT-Coder-Pro V1 33.4% Imported 2026-05-11
20 Grok 4.20 0309 v2 (Reasoning) 32.2% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
21 DeepSeek V4 Flash (Reasoning, Max Effort) 32.1% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
22 Claude Opus 4.7 (Non-reasoning, High Effort) 31.2% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
23 GPT-5.5 (low) 31% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
24 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 30% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
25 Grok 4.20 0309 (Reasoning) 30% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
26 Kimi K2.5 (Reasoning) 29.4% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
27 GPT-5.4 (low) 28.9% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
28 Qwen3.6 Max Preview 28.9% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
29 Claude Opus 4.5 (Reasoning) 28.4% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
30 MiMo-V2-Pro 28.3% MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-11
31 MiniMax-M2.7 28.1% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
32 GLM-5.1 (Reasoning) 28% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
33 DeepSeek V4 Flash (Reasoning, High Effort) 27.8% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
34 Gemini 3 Pro Preview (low) 27.6% Gemini 3
google-gemini-3
Imported 2026-05-11
35 Qwen3.5 397B A17B (Reasoning) 27.3% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
36 GLM-5 (Reasoning) 27.2% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
37 GPT-5.4 mini (xhigh) 26.6% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
38 GPT-5 (high) 26.5% GPT-5
openai-gpt-5
Imported 2026-05-11
39 GPT-5.1 (high) 26.5% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
40 GPT-5.4 nano (xhigh) 26.5% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
41 Qwen3 Max Thinking 26.2% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
42 DeepSeek V3.2 Speciale 26.1% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
43 Qwen3.6 Plus 25.7% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-11
44 GLM-5.1 (Non-reasoning) 25.6% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
45 GPT-5 Codex (high) 25.6% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
46 Hy3-preview (Reasoning) 25.5% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
47 GLM-5-Turbo 25.4% GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-11
48 MiMo-V2.5 25.2% MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-11
49 GLM-4.7 (Reasoning) 25.1% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
50 GPT-5.2 (medium) 24.9% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
51 Grok 4.20 0309 v2 (Non-reasoning) 24.2% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
52 Grok 4 23.9% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
53 GPT-5 (medium) 23.5% GPT-5
openai-gpt-5
Imported 2026-05-11
54 GPT-5.1 Codex (high) 23.4% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
55 Qwen3.5 122B A10B (Reasoning) 23.4% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
56 Gemma 4 31B (Reasoning) 22.7% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
57 Step 3.5 Flash 2603 22.6% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
58 Grok 4.20 0309 (Non-reasoning) 22.5% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
59 Kimi K2 Thinking 22.3% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
60 DeepSeek V3.2 (Reasoning) 22.2% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
61 MiniMax-M2.1 22.2% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
62 Qwen3.5 27B (Reasoning) 22.2% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
63 Qwen3.6 27B (Reasoning) 21.6% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
64 Gemini 2.5 Pro 21.1% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
65 MiMo-V2-Flash (Reasoning) 21.1% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
66 MiMo-V2-Omni-0327 20.4% Imported 2026-05-11
67 Qwen3.6 35B A3B (Reasoning) 20.2% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
68 MiMo-V2-Flash (Feb 2026) 20% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
69 o3 20% o3
openai-o3
Imported 2026-05-11
70 MiMo-V2-Omni 19.9% MiMo-V2-Omni
xiaomi-mimo-v2-omni
Imported 2026-05-11
71 GPT-5 mini (high) 19.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
72 Qwen3.5 35B A3B (Reasoning) 19.7% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
73 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 19.2% Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
74 MiniMax-M2.5 19.1% MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-11
75 Step 3.5 Flash 19.1% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
76 Qwen3.5 397B A17B (Non-reasoning) 18.8% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
77 Claude Opus 4.6 (Non-reasoning, High Effort) 18.6% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
78 gpt-oss-120B (high) 18.5% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
79 GPT-5 (low) 18.4% GPT-5
openai-gpt-5
Imported 2026-05-11
80 Gemma 4 26B A4B (Reasoning) 18.3% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
81 Kimi K2.6 (Non-reasoning) 18.2% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
82 Grok 4.1 Fast (Reasoning) 17.6% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
83 o4-mini (high) 17.5% o4 Mini
openai-o4-mini
Imported 2026-05-11
84 Claude 4.5 Sonnet (Reasoning) 17.3% Imported 2026-05-11
85 Gemini 2.5 Pro Preview (Mar' 25) 17.1% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
86 GPT-5.4 mini (medium) 17.1% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
87 Grok 4 Fast (Reasoning) 17% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
88 GPT-5.1 Codex mini (high) 16.9% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
89 Gemini 3.1 Flash-Lite Preview 16.2% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
90 KAT Coder Pro V2 16% K KAT-Coder-Pro V2
kwaipilot-kat-coder-pro-v2
Imported 2026-05-11
91 GLM 5V Turbo (Reasoning) 15.8% GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-11
92 Mercury 2 15.5% I Mercury 2
inception-mercury-2
Imported 2026-05-11
93 Gemini 2.5 Pro Preview (May' 25) 15.4% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
94 DeepSeek V3.1 Terminus (Reasoning) 15.2% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
95 Qwen3 235B A22B 2507 (Reasoning) 15% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
96 DeepSeek R1 0528 (May '25) 14.9% R1
deepseek-r1
Imported 2026-05-11
97 Qwen3.5 122B A10B (Non-reasoning) 14.8% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
98 GPT-5.4 nano (medium) 14.7% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
99 Trinity Large Thinking 14.7% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-11
100 GPT-5 mini (medium) 14.6% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
101 Gemini 3 Flash Preview (Non-reasoning) 14.1% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
102 Qwen3.5 Omni Plus 13.9% Imported 2026-05-11
103 DeepSeek V3.2 Exp (Reasoning) 13.8% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
104 Qwen3.6 27B (Non-reasoning) 13.6% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
105 Doubao Seed Code 13.3% Imported 2026-05-11
106 GLM-4.6 (Reasoning) 13.3% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
107 MiMo-V2.5-Pro (Non-reasoning) 13.3% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
108 Qwen3.5 9B (Reasoning) 13.3% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
109 Claude Sonnet 4.6 (Non-reasoning, High Effort) 13.2% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
110 Qwen3.5 27B (Non-reasoning) 13.2% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
111 K-EXAONE (Reasoning) 13.1% Imported 2026-05-11
112 DeepSeek V3.1 (Reasoning) 13% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
113 Claude Opus 4.5 (Non-reasoning) 12.9% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
114 Mistral Medium 3.5 12.8% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-11
115 Qwen3.5 35B A3B (Non-reasoning) 12.8% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
116 ERNIE 5.0 Thinking Preview 12.7% Imported 2026-05-11
117 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 12.7% Imported 2026-05-11
118 GPT-5.5 (Non-reasoning) 12.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
119 MiniMax-M2 12.5% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
120 Qwen3.6 35B A3B (Non-reasoning) 12.5% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
121 Kimi K2.5 (Non-reasoning) 12.3% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
122 o3-mini (high) 12.3% o3 Mini High
openai-o3-mini-high
Imported 2026-05-11
123 GLM-4.5 (Reasoning) 12.2% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
124 INTELLECT-3 12.1% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
125 Apriel-v1.5-15B-Thinker 12% Imported 2026-05-11
126 Qwen3 Max Thinking (Preview) 12% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
127 Claude 4.1 Opus (Reasoning) 11.9% Imported 2026-05-11
128 Claude 4 Opus (Reasoning) 11.7% Imported 2026-05-11
129 Qwen3 235B A22B (Reasoning) 11.7% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
130 Qwen3 Next 80B A3B (Reasoning) 11.7% Imported 2026-05-11
131 EXAONE 4.5 33B 11.6% Imported 2026-05-11
132 Gemini 2.5 Flash Preview (Reasoning) 11.6% Imported 2026-05-11
133 Gemma 4 31B (Non-reasoning) 11.5% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
134 Nemotron Cascade 2 30B A3B 11.4% Imported 2026-05-11
135 Gemini 2.5 Flash (Reasoning) 11.1% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
136 Grok 3 mini Reasoning (high) 11.1% Imported 2026-05-11
137 Qwen3 Max 11.1% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
138 Cogito v2.1 (Reasoning) 11% Imported 2026-05-11
139 Nova 2.0 Lite (high) 10.9% Imported 2026-05-11
140 Claude Sonnet 4.6 (Non-reasoning, Low Effort) 10.8% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
141 Falcon-H1R-7B 10.8% Imported 2026-05-11
142 Gemma 4 26B A4B (Non-reasoning) 10.7% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
143 GPT-5.4 (Non-reasoning) 10.6% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
144 Qwen3 235B A22B 2507 Instruct 10.6% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
145 DeepSeek V3.2 (Non-reasoning) 10.5% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
146 EXAONE 4.0 32B (Reasoning) 10.5% Imported 2026-05-11
147 Claude 3.7 Sonnet (Reasoning) 10.3% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
148 Hermes 4 - Llama-3.1 405B (Reasoning) 10.3% Imported 2026-05-11
149 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 10.2% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
150 Ring-1T 10.2% Imported 2026-05-11
151 Step3 VL 10B 10.2% Imported 2026-05-11
152 Qwen3 VL 235B A22B (Reasoning) 10.1% Imported 2026-05-11
153 Sarvam 105B (high) 10.1% Imported 2026-05-11
154 Solar Pro 3 10.1% U Solar Pro 3
upstage-solar-pro-3
Imported 2026-05-11
155 Nanbeige4.1-3B 10% Imported 2026-05-11
156 Apriel-v1.6-15B-Thinker 9.8% Imported 2026-05-11
157 gpt-oss-20B (high) 9.8% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
158 K2-V2 (high) 9.8% Imported 2026-05-11
159 Qwen3 30B A3B 2507 (Reasoning) 9.8% Imported 2026-05-11
160 Claude 4.5 Haiku (Reasoning) 9.7% Imported 2026-05-11
161 Claude 4 Sonnet (Reasoning) 9.6% Imported 2026-05-11
162 Magistral Medium 1.2 9.6% Imported 2026-05-11
163 Qwen3 VL 32B (Reasoning) 9.6% Imported 2026-05-11
164 K2 Think V2 9.5% Imported 2026-05-11
165 Magistral Medium 1 9.5% Imported 2026-05-11
166 Mistral Small 4 (Reasoning) 9.5% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
167 DeepSeek R1 (Jan '25) 9.3% R1
deepseek-r1
Imported 2026-05-11
168 Qwen3 Coder Next 9.3% Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-11
169 Qwen3 Max (Preview) 9.3% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
170 Solar Open 100B (Reasoning) 9.2% Imported 2026-05-11
171 Seed-OSS-36B-Instruct 9.1% Imported 2026-05-11
172 GLM-4.6V (Reasoning) 8.9% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
173 Nova 2.0 Pro Preview (medium) 8.9% Imported 2026-05-11
174 Ring-flash-2.0 8.9% Imported 2026-05-11
175 Mi:dm K 2.5 Pro Preview 8.8% Imported 2026-05-11
176 o3-mini 8.7% o3-mini
openai-o3-mini
Imported 2026-05-11
177 Qwen3 VL 30B A3B (Reasoning) 8.7% Imported 2026-05-11
178 DeepSeek V3.2 Exp (Non-reasoning) 8.6% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
179 Nova 2.0 Lite (medium) 8.6% Imported 2026-05-11
180 Qwen3.5 9B (Non-reasoning) 8.6% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
181 DeepSeek V3.1 Terminus (Non-reasoning) 8.4% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
182 Qwen3 32B (Reasoning) 8.3% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
183 GPT-5 nano (high) 8.2% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
184 Ling-2.6-1T 8.2% I Ling-2.6-1T
inclusionai-ling-2.6-1t
Imported 2026-05-11
185 MiniMax M1 80k 8.2% Imported 2026-05-11
186 Motif-2-12.7B-Reasoning 8.2% Imported 2026-05-11
187 QwQ 32B 8.2% Imported 2026-05-11
188 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 8.1% Imported 2026-05-11
189 MiMo-V2-Flash (Non-reasoning) 8% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
190 Hermes 4 - Llama-3.1 70B (Reasoning) 7.9% Imported 2026-05-11
191 Sonar Pro 7.9% Sonar Pro
perplexity-sonar-pro
Imported 2026-05-11
192 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 7.8% Imported 2026-05-11
193 Qwen3.5 4B (Reasoning) 7.8% Imported 2026-05-11
194 DeepSeek V4 Pro (Non-reasoning) 7.7% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
195 Mi:dm K 2.5 Pro 7.7% Imported 2026-05-11
196 o1 7.7% o1
openai-o1
Imported 2026-05-11
197 GPT-5 nano (medium) 7.6% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
198 Grok Code Fast 1 7.5% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
199 MiniMax M1 40k 7.5% Imported 2026-05-11
200 Qwen3.5 4B (Non-reasoning) 7.5% Imported 2026-05-11
201 GPT-5.2 (Non-reasoning) 7.3% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
202 Qwen3 Next 80B A3B Instruct 7.3% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
203 Qwen3 Omni 30B A3B (Reasoning) 7.3% Imported 2026-05-11
204 Sonar 7.3% Sonar
perplexity-sonar
Imported 2026-05-11
205 GLM-5 (Non-reasoning) 7.2% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
206 Ling-1T 7.2% Imported 2026-05-11
207 Magistral Small 1 7.2% Imported 2026-05-11
208 Claude 4.5 Sonnet (Non-reasoning) 7.1% Imported 2026-05-11
209 Gemini 2.0 Flash Thinking Experimental (Jan '25) 7.1% Imported 2026-05-11
210 GLM-4.7-Flash (Reasoning) 7.1% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
211 Qwen3.5 Omni Flash 7.1% Imported 2026-05-11
212 DeepSeek V4 Flash (Non-reasoning) 7% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
213 Kimi K2 7% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
214 Sarvam 30B (high) 7% Imported 2026-05-11
215 Solar Pro 2 (Reasoning) 7% Imported 2026-05-11
216 Gemini 2.0 Pro Experimental (Feb '25) 6.8% Imported 2026-05-11
217 GLM-4.5-Air 6.8% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
218 LFM2.5-1.2B-Instruct 6.8% LFM LFM2.5-1.2B-Instruct
liquid-lfm-2.5-1.2b-instruct
Imported 2026-05-11
219 Llama Nemotron Super 49B v1.5 (Reasoning) 6.8% Imported 2026-05-11
220 Nova 2.0 Omni (medium) 6.8% Imported 2026-05-11
221 Qwen3 30B A3B 2507 Instruct 6.8% Imported 2026-05-11
222 DBRX Instruct 6.6% Imported 2026-05-11
223 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 6.6% Imported 2026-05-11
224 JT-MINI 6.6% Imported 2026-05-11
225 Qwen3 30B A3B (Reasoning) 6.6% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
226 Grok 4.3 (Non-reasoning) 6.5% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
227 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 6.5% Imported 2026-05-11
228 Gemini 2.5 Flash-Lite (Reasoning) 6.4% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
229 Granite 4.0 H 350M 6.4% Imported 2026-05-11
230 Qwen3 VL 30B A3B Instruct 6.4% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
231 DeepSeek V3.1 (Non-reasoning) 6.3% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
232 Hy3-preview (Non-reasoning) 6.3% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
233 Kimi K2 0905 6.3% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
234 Ling-flash-2.0 6.3% Imported 2026-05-11
235 Qwen3 VL 235B A22B Instruct 6.3% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
236 Qwen3 VL 32B Instruct 6.3% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
237 Ling 2.6 Flash 6.2% I Ling-2.6-flash
inclusionai-ling-2.6-flash
Imported 2026-05-11
238 DeepSeek R1 Distill Llama 70B 6.1% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
239 GLM-4.7 (Non-reasoning) 6.1% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
240 LFM2.5-1.2B-Thinking 6.1% LFM LFM2.5-1.2B-Thinking
liquid-lfm-2.5-1.2b-thinking
Imported 2026-05-11
241 Magistral Small 1.2 6.1% Imported 2026-05-11
242 Tri-21B-Think 6.1% Imported 2026-05-11
243 LongCat Flash Lite 6% Imported 2026-05-11
244 Olmo 3.1 32B Think 6% Imported 2026-05-11
245 Claude 4 Opus (Non-reasoning) 5.9% Imported 2026-05-11
246 GLM-4.5V (Reasoning) 5.9% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
247 Olmo 3 32B Think 5.9% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
248 Qwen3 4B 2507 (Reasoning) 5.9% Imported 2026-05-11
249 Exaone 4.0 1.2B (Non-reasoning) 5.8% Imported 2026-05-11
250 Exaone 4.0 1.2B (Reasoning) 5.8% Imported 2026-05-11
251 GPT-5 (ChatGPT) 5.8% GPT-5
openai-gpt-5
Imported 2026-05-11
252 Llama 2 Chat 7B 5.8% Imported 2026-05-11
253 Olmo 3 7B Instruct 5.8% Imported 2026-05-11
254 GPT-5.4 mini (Non-Reasoning) 5.7% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
255 Granite 4.0 350M 5.7% Imported 2026-05-11
256 LFM2 1.2B 5.7% Imported 2026-05-11
257 Olmo 3 7B Think 5.7% Imported 2026-05-11
258 Qwen3 0.6B (Reasoning) 5.7% Imported 2026-05-11
259 Solar Pro 2 (Preview) (Reasoning) 5.7% Imported 2026-05-11
260 Tri-21B-think Preview 5.7% Imported 2026-05-11
261 DeepSeek R1 0528 Qwen3 8B 5.6% Imported 2026-05-11
262 Apertus 70B Instruct 5.5% Imported 2026-05-11
263 DeepSeek R1 Distill Qwen 32B 5.5% R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-11
264 HyperCLOVA X SEED Think (32B) 5.5% Imported 2026-05-11
265 OLMo 2 7B 5.5% Imported 2026-05-11
266 GPT-5 (minimal) 5.4% GPT-5
openai-gpt-5
Imported 2026-05-11
267 K-EXAONE (Non-reasoning) 5.4% Imported 2026-05-11
268 DeepSeek Coder V2 Lite Instruct 5.3% Imported 2026-05-11
269 Gemini 2.0 Flash (Feb '25) 5.3% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
270 Llama 3.2 Instruct 1B 5.3% Imported 2026-05-11
271 Ministral 3 3B 5.3% Imported 2026-05-11
272 Nemotron 3 Nano Omni 30B A3B Reasoning 5.3% Imported 2026-05-11
273 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 5.3% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
274 DeepSeek V3 0324 5.2% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
275 Gemma 3 1B Instruct 5.2% Imported 2026-05-11
276 Gemma 3 4B Instruct 5.2% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
277 GLM-4.6 (Non-reasoning) 5.2% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
278 GPT-5.1 (Non-reasoning) 5.2% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
279 gpt-oss-120B (low) 5.2% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
280 LFM2 2.6B 5.2% Imported 2026-05-11
281 Llama 3.2 Instruct 11B (Vision) 5.2% Imported 2026-05-11
282 Llama 3.2 Instruct 3B 5.2% Imported 2026-05-11
283 Nova 2.0 Pro Preview (low) 5.2% Imported 2026-05-11
284 Qwen3 0.6B (Non-reasoning) 5.2% Imported 2026-05-11
285 Qwen3 1.7B (Non-reasoning) 5.2% Imported 2026-05-11
286 Tiny Aya Global 5.2% Imported 2026-05-11
287 Gemini 2.5 Flash (Non-reasoning) 5.1% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
288 gpt-oss-20B (low) 5.1% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
289 Granite 4.0 1B 5.1% Imported 2026-05-11
290 Granite 4.0 Micro 5.1% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
291 Grok 3 5.1% GROK Grok 3
xaigrok-3
Imported 2026-05-11
292 Jamba 1.5 Mini 5.1% Imported 2026-05-11
293 LFM2.5-VL-1.6B 5.1% Imported 2026-05-11
294 Llama 3 Instruct 8B 5.1% Imported 2026-05-11
295 Llama 3.1 Instruct 8B 5.1% Imported 2026-05-11
296 Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 5.1% Imported 2026-05-11
297 Molmo 7B-D 5.1% Imported 2026-05-11
298 Qwen3 4B (Reasoning) 5.1% Imported 2026-05-11
299 Qwen3 Omni 30B A3B Instruct 5.1% Imported 2026-05-11
300 Reka Flash 3 5.1% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
301 Apertus 8B Instruct 5% Imported 2026-05-11
302 Gemini 2.5 Flash Preview (Non-reasoning) 5% Imported 2026-05-11
303 GPT-4o (March 2025, chatgpt-4o-latest) 5% GPT-4o
openai-gpt-4o
Imported 2026-05-11
304 GPT-5 mini (minimal) 5% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
305 Granite 4.0 H 1B 5% Imported 2026-05-11
306 Grok 4 Fast (Non-reasoning) 5% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
307 Grok 4.1 Fast (Non-reasoning) 5% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
308 Ling-mini-2.0 5% Imported 2026-05-11
309 Llama 2 Chat 70B 5% Imported 2026-05-11
310 EXAONE 4.0 32B (Non-reasoning) 4.9% Imported 2026-05-11
311 Gemini 1.5 Pro (Sep '24) 4.9% Imported 2026-05-11
312 Gemma 3n E4B Instruct Preview (May '25) 4.9% Imported 2026-05-11
313 GLM-4.7-Flash (Non-reasoning) 4.9% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
314 LFM 40B 4.9% Imported 2026-05-11
315 LFM2 8B A1B 4.9% Imported 2026-05-11
316 Llama 3.2 Instruct 90B (Vision) 4.9% Imported 2026-05-11
317 MiniCPM-V 4.6 1.3B 4.9% Imported 2026-05-11
318 o1-mini 4.9% Imported 2026-05-11
319 Olmo 3.1 32B Instruct 4.9% OLMO Olmo 3.1 32B Instruct
allenai-olmo-3.1-32b-instruct
Imported 2026-05-11
320 Qwen3.5 0.8B (Non-reasoning) 4.9% Imported 2026-05-11
321 Qwen3.5 2B (Non-reasoning) 4.9% Imported 2026-05-11
322 Claude 3.7 Sonnet (Non-reasoning) 4.8% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
323 Command-R (Mar '24) 4.8% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
324 Gemma 3 12B Instruct 4.8% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
325 Gemma 4 E2B (Reasoning) 4.8% Imported 2026-05-11
326 Llama 4 Maverick 4.8% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
327 Mistral Small 3.1 4.8% Imported 2026-05-11
328 NVIDIA Nemotron 3 Nano 4B 4.8% Imported 2026-05-11
329 OpenChat 3.5 (1210) 4.8% Imported 2026-05-11
330 Qwen2.5 Coder Instruct 7B 4.8% Imported 2026-05-11
331 Qwen3 1.7B (Reasoning) 4.8% Imported 2026-05-11
332 QwQ 32B-Preview 4.8% Imported 2026-05-11
333 Gemini 2.0 Flash (experimental) 4.7% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
334 Gemma 3 27B Instruct 4.7% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
335 Gemma 4 E4B (Non-reasoning) 4.7% Imported 2026-05-11
336 Grok Beta 4.7% Imported 2026-05-11
337 Llama 2 Chat 13B 4.7% Imported 2026-05-11
338 Nova Micro 4.7% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
339 Nova Premier 4.7% Imported 2026-05-11
340 Qwen3 235B A22B (Non-reasoning) 4.7% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
341 Qwen3 4B 2507 Instruct 4.7% Imported 2026-05-11
342 Command A 4.6% C Command A
cohere-command-a
Imported 2026-05-11
343 Gemini 1.0 Pro 4.6% Imported 2026-05-11
344 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 4.6% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
345 GPT-4.1 4.6% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
346 GPT-4.1 mini 4.6% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
347 Jamba 1.6 Mini 4.6% Imported 2026-05-11
348 Jamba Reasoning 3B 4.6% Imported 2026-05-11
349 Llama 3.1 Instruct 70B 4.6% Imported 2026-05-11
350 Llama 3.1 Nemotron Instruct 70B 4.6% Imported 2026-05-11
351 Ministral 3 14B 4.6% Imported 2026-05-11
352 Nova Lite 4.6% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
353 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 4.6% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
354 NVIDIA Nemotron Nano 9B V2 (Reasoning) 4.6% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
355 Qwen3 30B A3B (Non-reasoning) 4.6% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
356 Command-R+ (Apr '24) 4.5% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
357 Gemini 1.5 Flash-8B 4.5% Imported 2026-05-11
358 Gemma 4 E2B (Non-reasoning) 4.5% Imported 2026-05-11
359 Jamba 1.7 Mini 4.5% Imported 2026-05-11
360 Mixtral 8x7B Instruct 4.5% Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-11
361 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 4.5% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
362 Qwen2.5 Max 4.5% Imported 2026-05-11
363 DeepSeek R1 Distill Qwen 14B 4.4% Imported 2026-05-11
364 Gemini 2.0 Flash-Lite (Preview) 4.4% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
365 Gemma 3n E4B Instruct 4.4% Imported 2026-05-11
366 K2-V2 (medium) 4.4% Imported 2026-05-11
367 LFM2 24B A2B 4.4% LFM LFM2-24B-A2B
liquid-lfm-2-24b-a2b
Imported 2026-05-11
368 Llama 3 Instruct 70B 4.4% Imported 2026-05-11
369 Mistral Medium 3.1 4.4% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
370 Mistral Small (Feb '24) 4.4% Imported 2026-05-11
371 Molmo2-8B 4.4% Imported 2026-05-11
372 Phi-3 Mini Instruct 3.8B 4.4% Imported 2026-05-11
373 Phi-4 Multimodal Instruct 4.4% Imported 2026-05-11
374 Qwen3 Coder 480B A35B Instruct 4.4% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
375 Qwen3 VL 4B (Reasoning) 4.4% Imported 2026-05-11
376 Claude 4.5 Haiku (Non-reasoning) 4.3% Imported 2026-05-11
377 DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) 4.3% Imported 2026-05-11
378 Llama 4 Scout 4.3% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
379 Llama Nemotron Super 49B v1.5 (Non-reasoning) 4.3% Imported 2026-05-11
380 Ministral 3 8B 4.3% Imported 2026-05-11
381 Mistral 7B Instruct 4.3% Imported 2026-05-11
382 Mistral Medium 3 4.3% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
383 Mistral Small (Sep '24) 4.3% Imported 2026-05-11
384 Mistral Small 3.2 4.3% Imported 2026-05-11
385 Qwen3 14B (Reasoning) 4.3% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
386 Qwen3 32B (Non-reasoning) 4.3% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
387 Claude 2.1 4.2% Imported 2026-05-11
388 DeepSeek R1 Distill Llama 8B 4.2% Imported 2026-05-11
389 Gemini 1.5 Flash (May '24) 4.2% Imported 2026-05-11
390 Gemma 3 270M 4.2% Imported 2026-05-11
391 GPT-5.4 nano (Non-Reasoning) 4.2% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
392 Granite 3.3 8B (Non-reasoning) 4.2% Imported 2026-05-11
393 Granite 4.1 30B 4.2% Imported 2026-05-11
394 Hermes 4 - Llama-3.1 405B (Non-reasoning) 4.2% Imported 2026-05-11
395 Llama 3.1 Instruct 405B 4.2% Imported 2026-05-11
396 Nova 2.0 Lite (low) 4.2% Imported 2026-05-11
397 Phi-4 Mini Instruct 4.2% Imported 2026-05-11
398 Qwen2.5 Instruct 72B 4.2% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
399 Qwen2.5 Turbo 4.2% Qwen-Turbo
qwen-qwen-turbo
Imported 2026-05-11
400 Qwen3 14B (Non-reasoning) 4.2% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
401 Qwen3 8B (Reasoning) 4.2% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
402 GPT-5 nano (minimal) 4.1% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
403 Hermes 3 - Llama-3.1 70B 4.1% L Hermes 3 70B Instruct
nousresearch-hermes-3-llama-3.1-70b
Imported 2026-05-11
404 Mistral Large 3 4.1% Imported 2026-05-11
405 Mistral Saba 4.1% Mistral: Saba
mistralai-mistral-saba
Imported 2026-05-11
406 Mistral Small 3 4.1% Imported 2026-05-11
407 Mixtral 8x22B Instruct 4.1% Mistral: Mixtral 8x22B Instruct
mistralai-mixtral-8x22b-instruct
Imported 2026-05-11
408 Phi-4 4.1% Phi 4
microsoft-phi-4
Imported 2026-05-11
409 Claude 4 Sonnet (Non-reasoning) 4% Imported 2026-05-11
410 Devstral Small (May '25) 4% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
411 Gemma 3n E2B Instruct 4% Imported 2026-05-11
412 GPT-4o mini 4% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-11
413 Jamba 1.5 Large 4% Imported 2026-05-11
414 Jamba 1.6 Large 4% Imported 2026-05-11
415 Llama 3.3 Instruct 70B 4% Imported 2026-05-11
416 Mistral Large 2 (Nov '24) 4% Imported 2026-05-11
417 Nova 2.0 Omni (low) 4% Imported 2026-05-11
418 Nova 2.0 Pro Preview (Non-reasoning) 4% Imported 2026-05-11
419 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 4% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
420 Qwen3 Coder 30B A3B Instruct 4% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
421 Claude 3 Haiku 3.9% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-11
422 Claude 3.5 Sonnet (Oct '24) 3.9% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
423 DeepHermes 3 - Mistral 24B Preview (Non-reasoning) 3.9% Imported 2026-05-11
424 Gemini 1.5 Pro (May '24) 3.9% Imported 2026-05-11
425 GPT-4.1 nano 3.9% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
426 K2-V2 (low) 3.9% Imported 2026-05-11
427 Nova 2.0 Omni (Non-reasoning) 3.9% Imported 2026-05-11
428 Claude 3 Sonnet 3.8% Imported 2026-05-11
429 Claude Instant 3.8% Imported 2026-05-11
430 Devstral Medium 3.8% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
431 Granite 4.1 8B 3.8% Granite 4.1 8B
ibm-granite-granite-4.1-8b
Imported 2026-05-11
432 Grok 2 (Dec '24) 3.8% Imported 2026-05-11
433 Jamba 1.7 Large 3.8% Imported 2026-05-11
434 Qwen2.5 Coder Instruct 32B 3.8% Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-11
435 Qwen2.5 Instruct 32B 3.8% Imported 2026-05-11
436 Solar Pro 2 (Non-reasoning) 3.8% Imported 2026-05-11
437 Solar Pro 2 (Preview) (Non-reasoning) 3.8% Imported 2026-05-11
438 Claude 3.5 Sonnet (June '24) 3.7% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
439 Devstral Small (Jul '25) 3.7% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
440 Gemini 2.5 Flash-Lite (Non-reasoning) 3.7% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
441 Gemma 4 E4B (Reasoning) 3.7% Imported 2026-05-11
442 GLM-4.6V (Non-reasoning) 3.7% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
443 GPT-4o (ChatGPT) 3.7% GPT-4o
openai-gpt-4o
Imported 2026-05-11
444 Granite 4.0 H Small 3.7% Imported 2026-05-11
445 Mistral Small 4 (Non-reasoning) 3.7% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
446 OLMo 2 32B 3.7% Imported 2026-05-11
447 Qwen2 Instruct 72B 3.7% Imported 2026-05-11
448 Qwen3 4B (Non-reasoning) 3.7% Imported 2026-05-11
449 Qwen3 VL 4B Instruct 3.7% Imported 2026-05-11
450 DeepSeek V3 (Dec '24) 3.6% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
451 Devstral 2 3.6% Imported 2026-05-11
452 Gemini 2.0 Flash-Lite (Feb '25) 3.6% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
453 GLM-4.5V (Non-reasoning) 3.6% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
454 Hermes 4 - Llama-3.1 70B (Non-reasoning) 3.6% Imported 2026-05-11
455 Pixtral Large 3.6% Mistral: Pixtral Large 2411
mistralai-pixtral-large-2411
Imported 2026-05-11
456 Claude 3.5 Haiku 3.5% Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-11
457 ERNIE 4.5 300B A47B 3.5% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
458 Gemini 1.5 Flash (Sep '24) 3.5% Imported 2026-05-11
459 Llama 3.1 Tulu3 405B 3.5% Imported 2026-05-11
460 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 3.5% Imported 2026-05-11
461 Devstral Small 2 3.4% Imported 2026-05-11
462 Granite 4.1 3B 3.4% Imported 2026-05-11
463 Mistral Large (Feb '24) 3.4% Mistral Large
mistralai-mistral-large
Imported 2026-05-11
464 Mistral Medium 3.4% Imported 2026-05-11
465 Nova Pro 3.4% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
466 DeepSeek R1 Distill Qwen 1.5B 3.3% Imported 2026-05-11
467 GPT-4 Turbo 3.3% GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-11
468 GPT-4o (Nov '24) 3.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
469 Qwen3 VL 8B (Reasoning) 3.3% Imported 2026-05-11
470 Sarvam M (Reasoning) 3.3% Imported 2026-05-11
471 Mistral Large 2 (Jul '24) 3.2% Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-11
472 Claude 3 Opus 3.1% Imported 2026-05-11
473 Nova 2.0 Lite (Non-reasoning) 3% Imported 2026-05-11
474 GPT-4o (Aug '24) 2.9% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-11
475 Qwen3 VL 8B Instruct 2.9% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
476 GPT-4o (May '24) 2.8% GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-11
477 Qwen3 8B (Non-reasoning) 2.8% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
478 Kimi Linear 48B A3B Instruct 2.7% Imported 2026-05-11
479 Qwen3.5 2B (Reasoning) 2.1% Imported 2026-05-11
480 Qwen3.5 0.8B (Reasoning) 1.2% Imported 2026-05-11
1 GPT-5.4 Pro 58.7% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
2 GPT-5.5 Pro 57.2% GPT-5.5 Pro
openai-gpt-5.5-pro
Launch post 2026-04-23
3 Claude Opus 4.7 54.7% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
4 GPT-5.5 52.2% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
5 GPT-5.4 52.1% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
6 Gemini 3.1 Pro Preview 51.4% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
1 Claude Mythos Preview 64.7% Claude Mythos Preview
anthropic-claude-mythos-preview
Launch post 2026-04-16
2 GPT-5.4 Pro 58.7% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-16
3 Claude Opus 4.7 54.7% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-16
4 Claude Opus 4.6 53.3% Claude Opus 4.6
anthropic-claude-opus-4.6
Launch post 2026-04-16
5 Gemini 3.1 Pro Preview 51.4% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-16