GPQA Diamond

The hardest GPQA subset of graduate-level science questions in biology, chemistry, and physics.

503rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy

Showing 5 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Accuracy Model Match Provenance Sampled
1 Gemini 3.1 Pro Preview 94.3% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Self-reported 2026-05-28
2 Claude Opus 4.7 94.2% Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
3 Claude Opus 4.8 93.6% Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
1 Qwen3.7 Max 92.4% Qwen3.7 Max
qwen-qwen3.7-max
Self-reported 2026-05-28
2 Claude Opus 4.6 Max 91.3% Claude Opus 4.6
anthropic-claude-opus-4.6
Self-reported 2026-05-28
3 Kimi K2.6 Thinking 90.5% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Self-reported 2026-05-28
4 Qwen3.6 Plus 90.4% Qwen3.6 Plus
qwen-qwen3.6-plus
Self-reported 2026-05-28
5 DeepSeek V4 Pro Max 90.1% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Self-reported 2026-05-28
6 GLM-5.1 Thinking 86.2% GLM GLM 5.1
z-ai-glm-5.1
Self-reported 2026-05-28
1 Gemini 3.1 Pro Preview 94.1% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-11
2 GPT-5.5 (xhigh) 93.5% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
3 GPT-5.5 (high) 93.2% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
4 GPT-5.5 (medium) 92.6% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
5 GPT-5.4 (xhigh) 92% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
6 GPT-5.3 Codex (xhigh) 91.5% GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-11
7 Claude Opus 4.7 (Adaptive Reasoning, Max Effort) 91.4% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
8 Grok 4.20 0309 v2 (Reasoning) 91.1% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
9 Kimi K2.6 91.1% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
10 GPT-5.5 (low) 91% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
11 Gemini 3 Pro Preview (high) 90.8% Gemini 3
google-gemini-3
Imported 2026-05-11
12 DeepSeek V4 Pro (Reasoning, High Effort) 90.5% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
13 GPT-5.2 (xhigh) 90.3% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
14 Grok 4.3 90.1% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
15 GPT-5.2 Codex (xhigh) 89.9% GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-11
16 Gemini 3 Flash Preview (Reasoning) 89.8% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
17 Claude Opus 4.6 (Adaptive Reasoning, Max Effort) 89.6% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
18 DeepSeek V4 Flash (Reasoning, Max Effort) 89.4% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
19 Qwen3.5 397B A17B (Reasoning) 89.3% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
20 DeepSeek V4 Pro (Reasoning, Max Effort) 88.8% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
21 Qwen3.6 Max Preview 88.8% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-11
22 Gemini 3 Pro Preview (low) 88.7% Gemini 3
google-gemini-3
Imported 2026-05-11
23 Claude Opus 4.7 (Non-reasoning, High Effort) 88.5% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-11
24 Grok 4.20 0309 (Reasoning) 88.5% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
25 Muse Spark 88.4% Imported 2026-05-11
26 Qwen3.6 Plus 88.2% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-11
27 Kimi K2.5 (Reasoning) 87.9% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
28 Grok 4 87.7% GROK Grok 4
x-ai-grok-4
Imported 2026-05-11
29 Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) 87.5% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
30 GPT-5.4 mini (xhigh) 87.5% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
31 MiniMax-M2.7 87.4% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-11
32 GPT-5.1 (high) 87.3% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
33 DeepSeek V3.2 Speciale 87.1% DeepSeek V3.2 Speciale
deepseek-deepseek-v3.2-speciale
Imported 2026-05-11
34 GPT-5.4 (low) 87.1% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
35 MiMo-V2-Pro 87% MiMo-V2-Pro
xiaomi-mimo-v2-pro
Imported 2026-05-11
36 GLM-5.1 (Reasoning) 86.8% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
37 DeepSeek V4 Flash (Reasoning, High Effort) 86.7% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
38 Hy3-preview (Reasoning) 86.7% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
39 Claude Opus 4.5 (Reasoning) 86.6% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
40 MiMo-V2.5-Pro 86.6% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
41 GPT-5.2 (medium) 86.4% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
42 Qwen3 Max Thinking 86.1% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
43 Qwen3.5 397B A17B (Non-reasoning) 86.1% Qwen3.5 397B A17B
qwen-qwen3.5-397b-a17b
Imported 2026-05-11
44 GPT-5.1 Codex (high) 86% GPT-5.1-Codex
openai-gpt-5.1-codex
Imported 2026-05-11
45 GLM-4.7 (Reasoning) 85.9% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
46 Qwen3.5 27B (Reasoning) 85.8% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
47 Gemma 4 31B (Reasoning) 85.7% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
48 Qwen3.5 122B A10B (Reasoning) 85.7% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
49 KAT Coder Pro V2 85.5% K KAT-Coder-Pro V2
kwaipilot-kat-coder-pro-v2
Imported 2026-05-11
50 MiMo-V2-Omni-0327 85.5% Imported 2026-05-11
51 GPT-5 (high) 85.4% GPT-5
openai-gpt-5
Imported 2026-05-11
52 Grok 4.1 Fast (Reasoning) 85.3% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
53 MiMo-V2.5 84.9% MiMo-V2.5
xiaomi-mimo-v2.5
Imported 2026-05-11
54 Nanbeige4.1-3B 84.9% Imported 2026-05-11
55 MiniMax-M2.5 84.8% MiniMax M2.5
minimax-minimax-m2.5
Imported 2026-05-11
56 GLM-5-Turbo 84.7% GLM GLM 5 Turbo
z-ai-glm-5-turbo
Imported 2026-05-11
57 Grok 4 Fast (Reasoning) 84.7% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
58 MiMo-V2-Flash (Reasoning) 84.6% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
59 o3-pro 84.5% o3 Pro
openai-o3-pro
Imported 2026-05-11
60 Qwen3.5 35B A3B (Reasoning) 84.5% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
61 Gemini 2.5 Pro 84.4% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-11
62 GPT-5 (medium) 84.2% GPT-5
openai-gpt-5
Imported 2026-05-11
63 Qwen3.5 27B (Non-reasoning) 84.2% Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-11
64 Qwen3.6 27B (Reasoning) 84.2% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
65 Qwen3.6 35B A3B (Reasoning) 84.1% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
66 Claude Opus 4.6 (Non-reasoning, High Effort) 84% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-11
67 DeepSeek V3.2 (Reasoning) 84% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
68 GLM-5.1 (Non-reasoning) 83.9% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-11
69 Kimi K2 Thinking 83.8% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-11
70 GPT-5 Codex (high) 83.7% GPT-5 Codex
openai-gpt-5-codex
Imported 2026-05-11
71 Gemini 2.5 Pro Preview (Mar' 25) 83.6% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
72 MiMo-V2-Flash (Feb 2026) 83.5% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
73 Claude 4.5 Sonnet (Reasoning) 83.4% Imported 2026-05-11
74 Step 3.5 Flash 83.1% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
75 MiniMax-M2.1 83% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-11
76 Qwen3.6 27B (Non-reasoning) 82.9% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-11
77 GPT-5 mini (high) 82.8% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
78 MiMo-V2-Omni 82.8% MiMo-V2-Omni
xiaomi-mimo-v2-omni
Imported 2026-05-11
79 o3 82.7% o3
openai-o3
Imported 2026-05-11
80 Qwen3.5 122B A10B (Non-reasoning) 82.7% Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-11
81 Qwen3.5 Omni Plus 82.6% Imported 2026-05-11
82 Step 3.5 Flash 2603 82.6% S Step 3.5 Flash
stepfun-step-3.5-flash
Imported 2026-05-11
83 GPT-5.4 mini (medium) 82.3% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
84 Gemini 2.5 Pro Preview (May' 25) 82.2% Gemini 2.5 Pro Preview 06-05
google-gemini-2.5-pro-preview
Imported 2026-05-11
85 Gemini 3.1 Flash-Lite Preview 82.2% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-11
86 GLM-5 (Reasoning) 82% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
87 Qwen3.5 35B A3B (Non-reasoning) 81.9% Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-11
88 GPT-5.4 nano (xhigh) 81.7% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
89 Qwen3.6 35B A3B (Non-reasoning) 81.7% Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-11
90 DeepSeek R1 0528 (May '25) 81.3% R1
deepseek-r1
Imported 2026-05-11
91 GPT-5.1 Codex mini (high) 81.3% GPT-5.1-Codex-Mini
openai-gpt-5.1-codex-mini
Imported 2026-05-11
92 Gemini 3 Flash Preview (Non-reasoning) 81.2% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-11
93 ERNIE 4.5 300B A47B 81.1% ERNIE 4.5 300B A47B
baidu-ernie-4.5-300b-a47b
Imported 2026-05-11
94 Nova 2.0 Lite (high) 81.1% Imported 2026-05-11
95 Claude Opus 4.5 (Non-reasoning) 81% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-11
96 Claude 4.1 Opus (Reasoning) 80.9% Imported 2026-05-11
97 GLM 5V Turbo (Reasoning) 80.9% GLM GLM 5V Turbo
z-ai-glm-5v-turbo
Imported 2026-05-11
98 GPT-5 (low) 80.8% GPT-5
openai-gpt-5
Imported 2026-05-11
99 Qwen3.5 9B (Reasoning) 80.6% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
100 GPT-5 mini (medium) 80.3% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
101 NVIDIA Nemotron 3 Super 120B A12B (Reasoning) 80% Nemotron 3 Super
nvidia-nemotron-3-super-120b-a12b
Imported 2026-05-11
102 Claude Sonnet 4.6 (Non-reasoning, High Effort) 79.9% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
103 Claude Sonnet 4.6 (Non-reasoning, Low Effort) 79.7% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-11
104 DeepSeek V3.2 Exp (Reasoning) 79.7% DeepSeek V3.2 Exp
deepseek-deepseek-v3.2-exp
Imported 2026-05-11
105 Claude 4 Opus (Reasoning) 79.6% Imported 2026-05-11
106 EXAONE 4.5 33B 79.4% Imported 2026-05-11
107 Gemini 2.5 Flash Preview (Sep '25) (Reasoning) 79.3% Imported 2026-05-11
108 DeepSeek V3.1 Terminus (Reasoning) 79.2% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
109 Gemma 4 26B A4B (Reasoning) 79.2% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
110 Grok 3 mini Reasoning (high) 79.1% Imported 2026-05-11
111 Gemini 2.5 Flash (Reasoning) 79% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
112 Qwen3 235B A22B 2507 (Reasoning) 79% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
113 Kimi K2.5 (Non-reasoning) 78.9% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-11
114 Kimi K2.6 (Non-reasoning) 78.8% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-11
115 Qwen3.5 9B (Non-reasoning) 78.6% Qwen3.5-9B
qwen-qwen3.5-9b
Imported 2026-05-11
116 Grok 4.20 0309 (Non-reasoning) 78.5% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
117 Nova 2.0 Pro Preview (medium) 78.5% Imported 2026-05-11
118 o4-mini (high) 78.4% o4 Mini
openai-o4-mini
Imported 2026-05-11
119 K-EXAONE (Reasoning) 78.3% Imported 2026-05-11
120 GLM-4.5 (Reasoning) 78.2% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-11
121 gpt-oss-120B (high) 78.2% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
122 GLM-4.6 (Reasoning) 78% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
123 DeepSeek V3.1 (Reasoning) 77.9% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
124 Claude 4 Sonnet (Reasoning) 77.7% Imported 2026-05-11
125 ERNIE 5.0 Thinking Preview 77.7% Imported 2026-05-11
126 MiniMax-M2 77.7% MiniMax M2
minimax-minimax-m2
Imported 2026-05-11
127 Grok 4.20 0309 v2 (Non-reasoning) 77.6% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-11
128 Qwen3 Max Thinking (Preview) 77.6% Qwen3 Max Thinking
qwen-qwen3-max-thinking
Imported 2026-05-11
129 Ring-1T 77.4% Imported 2026-05-11
130 o3-mini (high) 77.3% o3 Mini High
openai-o3-mini-high
Imported 2026-05-11
131 Claude 3.7 Sonnet (Reasoning) 77.2% Claude 3.7 Sonnet (thinking)
anthropic-claude-3.7-sonnet-thinking
Imported 2026-05-11
132 Qwen3 VL 235B A22B (Reasoning) 77.2% Imported 2026-05-11
133 Qwen3.5 4B (Reasoning) 77.1% Imported 2026-05-11
134 Mercury 2 77% I Mercury 2
inception-mercury-2
Imported 2026-05-11
135 Mistral Small 4 (Reasoning) 76.9% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
136 Cogito v2.1 (Reasoning) 76.8% Imported 2026-05-11
137 GPT-5.5 (Non-reasoning) 76.8% GPT-5.5
openai-gpt-5.5
Imported 2026-05-11
138 Nova 2.0 Lite (medium) 76.8% Imported 2026-05-11
139 Kimi K2 0905 76.7% KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-11
140 Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) 76.6% Imported 2026-05-11
141 Kimi K2 76.6% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-11
142 Doubao Seed Code 76.4% Imported 2026-05-11
143 KAT-Coder-Pro V1 76.4% Imported 2026-05-11
144 Qwen3 Max 76.4% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
145 Qwen3 Max (Preview) 76.4% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-11
146 Gemma 4 31B (Non-reasoning) 76.3% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-11
147 MiMo-V2.5-Pro (Non-reasoning) 76.2% MiMo-V2.5-Pro
xiaomi-mimo-v2.5-pro
Imported 2026-05-11
148 GPT-5.4 nano (medium) 76.1% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
149 INTELLECT-3 76.1% PI INTELLECT-3
prime-intellect-intellect-3
Imported 2026-05-11
150 Nova 2.0 Omni (medium) 76% Imported 2026-05-11
151 Qwen3 Next 80B A3B (Reasoning) 75.9% Imported 2026-05-11
152 Nemotron Cascade 2 30B A3B 75.8% Imported 2026-05-11
153 NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) 75.7% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
154 Qwen3 235B A22B 2507 Instruct 75.3% Qwen3 235B A22B Instruct 2507
qwen-qwen3-235b-a22b-2507
Imported 2026-05-11
155 Ling-2.6-1T 75.2% I Ling-2.6-1T
inclusionai-ling-2.6-1t
Imported 2026-05-11
156 Trinity Large Thinking 75.2% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-11
157 DeepSeek V3.1 Terminus (Non-reasoning) 75.1% DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-11
158 DeepSeek V3.2 (Non-reasoning) 75.1% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
159 Nova 2.0 Pro Preview (low) 75.1% Imported 2026-05-11
160 GPT-5.4 (Non-reasoning) 74.8% GPT-5.4
openai-gpt-5.4
Imported 2026-05-11
161 Llama Nemotron Super 49B v1.5 (Reasoning) 74.8% Imported 2026-05-11
162 Mistral Medium 3.5 74.8% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-11
163 o3-mini 74.8% o3-mini
openai-o3-mini
Imported 2026-05-11
164 o1 74.7% o1
openai-o1
Imported 2026-05-11
165 Qwen3.5 Omni Flash 74.2% Imported 2026-05-11
166 EXAONE 4.0 32B (Reasoning) 73.9% Imported 2026-05-11
167 Magistral Medium 1.2 73.9% Imported 2026-05-11
168 DeepSeek V3.2 Exp (Non-reasoning) 73.8% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-11
169 Qwen3 Next 80B A3B Instruct 73.8% Qwen3 Next 80B A3B Instruct
qwen-qwen3-next-80b-a3b-instruct
Imported 2026-05-11
170 Sarvam 105B (high) 73.8% Imported 2026-05-11
171 Qwen3 Coder Next 73.7% Qwen3 Coder Next
qwen-qwen3-coder-next
Imported 2026-05-11
172 DeepSeek V3.1 (Non-reasoning) 73.5% DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-11
173 Apriel-v1.6-15B-Thinker 73.3% Imported 2026-05-11
174 GLM-4.5-Air 73.3% GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-11
175 Qwen3 VL 32B (Reasoning) 73.3% Imported 2026-05-11
176 Hy3-preview (Non-reasoning) 73.2% T Hy3 preview
tencent-hy3-preview
Imported 2026-05-11
177 Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) 72.8% Imported 2026-05-11
178 Claude 4.5 Sonnet (Non-reasoning) 72.7% Imported 2026-05-11
179 Grok Code Fast 1 72.7% GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-11
180 Hermes 4 - Llama-3.1 405B (Reasoning) 72.7% Imported 2026-05-11
181 Qwen3 Omni 30B A3B (Reasoning) 72.6% Imported 2026-05-11
182 Seed-OSS-36B-Instruct 72.6% Imported 2026-05-11
183 Ring-flash-2.0 72.5% Imported 2026-05-11
184 Solar Pro 3 72.4% U Solar Pro 3
upstage-solar-pro-3
Imported 2026-05-11
185 Mi:dm K 2.5 Pro Preview 72.2% Imported 2026-05-11
186 Qwen3 VL 30B A3B (Reasoning) 72% Imported 2026-05-11
187 GLM-4.6V (Reasoning) 71.9% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
188 Ling-1T 71.9% Imported 2026-05-11
189 DeepSeek V4 Pro (Non-reasoning) 71.7% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-11
190 DeepSeek V4 Flash (Non-reasoning) 71.6% DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-11
191 Gemma 4 26B A4B (Non-reasoning) 71.4% Gemma 4 26B A4B
google-gemma-4-26b-a4b-it
Imported 2026-05-11
192 Apriel-v1.5-15B-Thinker 71.3% Imported 2026-05-11
193 K2 Think V2 71.3% Imported 2026-05-11
194 GPT-5.2 (Non-reasoning) 71.2% GPT-5.2
openai-gpt-5.2
Imported 2026-05-11
195 Qwen3 VL 235B A22B Instruct 71.2% Qwen3 VL 235B A22B Instruct
qwen-qwen3-vl-235b-a22b-instruct
Imported 2026-05-11
196 Qwen3.5 4B (Non-reasoning) 71.2% Imported 2026-05-11
197 Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) 70.9% Imported 2026-05-11
198 DeepSeek R1 (Jan '25) 70.8% R1
deepseek-r1
Imported 2026-05-11
199 Qwen3 30B A3B 2507 (Reasoning) 70.7% Imported 2026-05-11
200 Claude 4 Opus (Non-reasoning) 70.1% Imported 2026-05-11
201 Gemini 2.0 Flash Thinking Experimental (Jan '25) 70.1% Imported 2026-05-11
202 Mi:dm K 2.5 Pro 70.1% Imported 2026-05-11
203 Qwen3 235B A22B (Reasoning) 70% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
204 Hermes 4 - Llama-3.1 70B (Reasoning) 69.9% Imported 2026-05-11
205 Nova 2.0 Omni (low) 69.9% Imported 2026-05-11
206 Gemini 2.5 Flash Preview (Reasoning) 69.8% Imported 2026-05-11
207 Nova 2.0 Lite (low) 69.8% Imported 2026-05-11
208 MiniMax M1 80k 69.7% Imported 2026-05-11
209 K-EXAONE (Non-reasoning) 69.5% Imported 2026-05-11
210 Motif-2-12.7B-Reasoning 69.5% Imported 2026-05-11
211 Qwen3 VL 30B A3B Instruct 69.5% Qwen3 VL 30B A3B Instruct
qwen-qwen3-vl-30b-a3b-instruct
Imported 2026-05-11
212 Grok 3 69.3% GROK Grok 3
xaigrok-3
Imported 2026-05-11
213 Step3 VL 10B 69% Imported 2026-05-11
214 gpt-oss-20B (high) 68.8% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
215 GPT-5 mini (minimal) 68.7% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-11
216 Solar Pro 2 (Reasoning) 68.7% Imported 2026-05-11
217 GPT-5 (ChatGPT) 68.6% GPT-5
openai-gpt-5
Imported 2026-05-11
218 GLM-4.5V (Reasoning) 68.4% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
219 Claude 4 Sonnet (Non-reasoning) 68.3% Imported 2026-05-11
220 Gemini 2.5 Flash (Non-reasoning) 68.3% Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-11
221 MiniMax M1 40k 68.2% Imported 2026-05-11
222 K2-V2 (high) 68.1% Imported 2026-05-11
223 Mistral Large 3 68% Imported 2026-05-11
224 Magistral Medium 1 67.9% Imported 2026-05-11
225 GPT-5 nano (high) 67.6% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
226 JT-MINI 67.6% Imported 2026-05-11
227 GPT-5 (minimal) 67.3% GPT-5
openai-gpt-5
Imported 2026-05-11
228 Claude 4.5 Haiku (Reasoning) 67.2% Imported 2026-05-11
229 gpt-oss-120B (low) 67.2% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-11
230 Llama 4 Maverick 67.1% Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-11
231 Qwen3 VL 32B Instruct 67.1% Qwen3 VL 32B Instruct
qwen-qwen3-vl-32b-instruct
Imported 2026-05-11
232 GPT-5 nano (medium) 67% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
233 Qwen3 32B (Reasoning) 66.8% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
234 Qwen3 4B 2507 (Reasoning) 66.7% Imported 2026-05-11
235 GLM-5 (Non-reasoning) 66.6% GLM GLM 5
z-ai-glm-5
Imported 2026-05-11
236 GPT-4.1 66.6% GPT-4.1
openai-gpt-4.1
Imported 2026-05-11
237 GLM-4.7 (Non-reasoning) 66.4% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-11
238 GPT-4.1 mini 66.4% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-11
239 Magistral Small 1.2 66.3% Imported 2026-05-11
240 Falcon-H1R-7B 66.1% Imported 2026-05-11
241 Qwen3 30B A3B 2507 Instruct 65.9% Imported 2026-05-11
242 Grok 4.3 (Non-reasoning) 65.8% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-11
243 Ling-flash-2.0 65.7% Imported 2026-05-11
244 Solar Open 100B (Reasoning) 65.7% Imported 2026-05-11
245 Claude 3.7 Sonnet (Non-reasoning) 65.6% Claude 3.7 Sonnet
anthropic-claude-3.7-sonnet
Imported 2026-05-11
246 MiMo-V2-Flash (Non-reasoning) 65.6% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-11
247 DeepSeek V3 0324 65.5% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-11
248 GPT-4o (March 2025, chatgpt-4o-latest) 65.5% GPT-4o
openai-gpt-4o
Imported 2026-05-11
249 Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) 65.1% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-11
250 Claude 4.5 Haiku (Non-reasoning) 64.6% Imported 2026-05-11
251 GPT-5.1 (Non-reasoning) 64.3% GPT-5.1
openai-gpt-5.1
Imported 2026-05-11
252 Llama 3.3 Nemotron Super 49B v1 (Reasoning) 64.3% Imported 2026-05-11
253 Magistral Small 1 64.1% Imported 2026-05-11
254 Grok 4.1 Fast (Non-reasoning) 63.7% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-11
255 Gemini 2.0 Flash (experimental) 63.6% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
256 LongCat Flash Lite 63.6% Imported 2026-05-11
257 Nova 2.0 Pro Preview (Non-reasoning) 63.6% Imported 2026-05-11
258 Sarvam 30B (high) 63.3% Imported 2026-05-11
259 GLM-4.6 (Non-reasoning) 63.2% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-11
260 EXAONE 4.0 32B (Non-reasoning) 62.8% Imported 2026-05-11
261 Gemini 2.5 Flash-Lite (Reasoning) 62.5% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
262 Gemini 2.0 Flash (Feb '25) 62.3% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-11
263 Sonar Reasoning 62.3% Imported 2026-05-11
264 Gemini 2.0 Pro Experimental (Feb '25) 62.2% Imported 2026-05-11
265 Qwen3 Omni 30B A3B Instruct 62% Imported 2026-05-11
266 Qwen3 Coder 480B A35B Instruct 61.8% Qwen3 Coder 480B A35B
qwen-qwen3-coder
Imported 2026-05-11
267 Qwen3 30B A3B (Reasoning) 61.6% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
268 DeepSeek R1 Distill Qwen 32B 61.5% R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-11
269 HyperCLOVA X SEED Think (32B) 61.5% Imported 2026-05-11
270 Qwen3 235B A22B (Non-reasoning) 61.3% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-11
271 DeepSeek R1 0528 Qwen3 8B 61.2% Imported 2026-05-11
272 gpt-oss-20B (low) 61.1% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-11
273 Olmo 3 32B Think 61% OLMO Olmo 3 32B Think
allenai-olmo-3-32b-think
Imported 2026-05-11
274 GPT-5.4 mini (Non-Reasoning) 60.6% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-11
275 Grok 4 Fast (Non-reasoning) 60.6% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-11
276 Qwen3 14B (Reasoning) 60.4% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
277 Nova 2.0 Lite (Non-reasoning) 60.3% Imported 2026-05-11
278 o1-mini 60.3% Imported 2026-05-11
279 Tri-21B-Think 60.1% Imported 2026-05-11
280 Claude 3.5 Sonnet (Oct '24) 59.9% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
281 K2-V2 (medium) 59.8% Imported 2026-05-11
282 Devstral 2 59.4% Imported 2026-05-11
283 Gemini 2.5 Flash Preview (Non-reasoning) 59.4% Imported 2026-05-11
284 Ling 2.6 Flash 59.3% I Ling-2.6-flash
inclusionai-ling-2.6-flash
Imported 2026-05-11
285 QwQ 32B 59.3% Imported 2026-05-11
286 Olmo 3.1 32B Think 59.1% Imported 2026-05-11
287 Gemini 1.5 Pro (Sep '24) 58.9% Imported 2026-05-11
288 Qwen3 8B (Reasoning) 58.9% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
289 Mistral Medium 3.1 58.8% Mistral: Mistral Medium 3.1
mistralai-mistral-medium-3.1
Imported 2026-05-11
290 Llama 4 Scout 58.7% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-11
291 Qwen2.5 Max 58.7% Imported 2026-05-11
292 GLM-4.7-Flash (Reasoning) 58.1% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
293 Qwen3 VL 8B (Reasoning) 57.9% Imported 2026-05-11
294 Mistral Medium 3 57.8% Mistral: Mistral Medium 3
mistralai-mistral-medium-3
Imported 2026-05-11
295 Solar Pro 2 (Preview) (Reasoning) 57.8% Imported 2026-05-11
296 Sonar Pro 57.8% Sonar Pro
perplexity-sonar-pro
Imported 2026-05-11
297 Gemma 4 E4B (Reasoning) 57.6% Imported 2026-05-11
298 Phi-4 57.5% Phi 4
microsoft-phi-4
Imported 2026-05-11
299 GLM-4.5V (Non-reasoning) 57.3% GLM GLM 4.5V
z-ai-glm-4.5v
Imported 2026-05-11
300 Ministral 3 14B 57.2% Imported 2026-05-11
301 NVIDIA Nemotron Nano 12B v2 VL (Reasoning) 57.2% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
302 Mistral Small 4 (Non-reasoning) 57.1% Mistral: Mistral Small 4
mistralai-mistral-small-2603
Imported 2026-05-11
303 NVIDIA Nemotron Nano 9B V2 (Reasoning) 57% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
304 Nova Premier 56.9% Imported 2026-05-11
305 GLM-4.6V (Non-reasoning) 56.6% GLM GLM 4.6V
z-ai-glm-4.6v
Imported 2026-05-11
306 Ling-mini-2.0 56.2% Imported 2026-05-11
307 Solar Pro 2 (Non-reasoning) 56.1% Imported 2026-05-11
308 Claude 3.5 Sonnet (June '24) 56% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-11
309 GPT-5.4 nano (Non-Reasoning) 55.8% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-11
310 DeepSeek V3 (Dec '24) 55.7% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-11
311 NVIDIA Nemotron Nano 9B V2 (Non-reasoning) 55.7% Nemotron Nano 9B V2
nvidia-nemotron-nano-9b-v2
Imported 2026-05-11
312 QwQ 32B-Preview 55.7% Imported 2026-05-11
313 Nova 2.0 Omni (Non-reasoning) 55.5% Imported 2026-05-11
314 Gemma 4 E4B (Non-reasoning) 54.9% Imported 2026-05-11
315 Solar Pro 2 (Preview) (Non-reasoning) 54.4% Imported 2026-05-11
316 GPT-4o (Nov '24) 54.3% GPT-4o
openai-gpt-4o
Imported 2026-05-11
317 Gemini 2.0 Flash-Lite (Preview) 54.2% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
318 K2-V2 (low) 54.1% Imported 2026-05-11
319 Olmo 3.1 32B Instruct 53.9% OLMO Olmo 3.1 32B Instruct
allenai-olmo-3.1-32b-instruct
Imported 2026-05-11
320 Tri-21B-think Preview 53.8% Imported 2026-05-11
321 Hermes 4 - Llama-3.1 405B (Non-reasoning) 53.6% Imported 2026-05-11
322 Gemini 2.0 Flash-Lite (Feb '25) 53.5% Gemini 2.0 Flash Lite
google-gemini-2.0-flash-lite-001
Imported 2026-05-11
323 Qwen3 32B (Non-reasoning) 53.5% Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-11
324 Devstral Small 2 53.2% Imported 2026-05-11
325 Reka Flash 3 52.9% REKA Reka Flash 3
rekaai-reka-flash-3
Imported 2026-05-11
326 Command A 52.7% C Command A
cohere-command-a
Imported 2026-05-11
327 GPT-4o (May '24) 52.6% GPT-4o (2024-05-13)
openai-gpt-4o-2024-05-13
Imported 2026-05-11
328 Qwen3 4B (Reasoning) 52.2% Imported 2026-05-11
329 GPT-4o (Aug '24) 52.1% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-11
330 Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) 51.7% Imported 2026-05-11
331 Qwen3 4B 2507 Instruct 51.7% Imported 2026-05-11
332 Llama 3.1 Tulu3 405B 51.6% Imported 2026-05-11
333 Olmo 3 7B Think 51.6% Imported 2026-05-11
334 Qwen3 Coder 30B A3B Instruct 51.6% Qwen3 Coder 30B A3B Instruct
qwen-qwen3-coder-30b-a3b-instruct
Imported 2026-05-11
335 Exaone 4.0 1.2B (Reasoning) 51.5% Imported 2026-05-11
336 Llama 3.1 Instruct 405B 51.5% Imported 2026-05-11
337 Qwen3 30B A3B (Non-reasoning) 51.5% Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-11
338 NVIDIA Nemotron 3 Nano 4B 51.3% Imported 2026-05-11
339 GPT-4.1 nano 51.2% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-11
340 GPT-4o (ChatGPT) 51.1% GPT-4o
openai-gpt-4o
Imported 2026-05-11
341 Grok 2 (Dec '24) 51% Imported 2026-05-11
342 Mistral Small 3.2 50.5% Imported 2026-05-11
343 Pixtral Large 50.5% Mistral: Pixtral Large 2411
mistralai-pixtral-large-2411
Imported 2026-05-11
344 Nova Pro 49.9% Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-11
345 Llama 3.3 Instruct 70B 49.8% Imported 2026-05-11
346 Qwen3 VL 4B (Reasoning) 49.4% Imported 2026-05-11
347 Devstral Medium 49.2% Mistral: Devstral Medium
mistralai-devstral-medium
Imported 2026-05-11
348 Hermes 4 - Llama-3.1 70B (Non-reasoning) 49.1% Imported 2026-05-11
349 Qwen2.5 Instruct 72B 49.1% Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-11
350 Claude 3 Opus 48.9% Imported 2026-05-11
351 Mistral Large 2 (Nov '24) 48.6% Imported 2026-05-11
352 DeepSeek R1 Distill Qwen 14B 48.4% Imported 2026-05-11
353 Granite 4.1 30B 48.1% Imported 2026-05-11
354 Llama Nemotron Super 49B v1.5 (Non-reasoning) 48.1% Imported 2026-05-11
355 Gemini 2.5 Flash-Lite (Non-reasoning) 47.4% Gemini 2.5 Flash Lite
google-gemini-2.5-flash-lite
Imported 2026-05-11
356 LFM2 24B A2B 47.4% LFM LFM2-24B-A2B
liquid-lfm-2-24b-a2b
Imported 2026-05-11
357 Mistral Large 2 (Jul '24) 47.2% Mistral Large 2407
mistralai-mistral-large-2407
Imported 2026-05-11
358 Grok Beta 47.1% Imported 2026-05-11
359 Ministral 3 8B 47.1% Imported 2026-05-11
360 Sonar 47.1% Sonar
perplexity-sonar
Imported 2026-05-11
361 Qwen3 14B (Non-reasoning) 47% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-11
362 Nemotron 3 Nano Omni 30B A3B Reasoning 46.9% Imported 2026-05-11
363 Qwen2.5 Instruct 32B 46.6% Imported 2026-05-11
364 Llama 3.1 Nemotron Instruct 70B 46.5% Imported 2026-05-11
365 Gemini 1.5 Flash (Sep '24) 46.3% Imported 2026-05-11
366 Mistral Small 3 46.2% Imported 2026-05-11
367 Qwen3.5 2B (Reasoning) 45.6% Imported 2026-05-11
368 Mistral Small 3.1 45.4% Imported 2026-05-11
369 GLM-4.7-Flash (Non-reasoning) 45.2% GLM GLM 4.7 Flash
z-ai-glm-4.7-flash
Imported 2026-05-11
370 Qwen3 8B (Non-reasoning) 45.2% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-11
371 NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) 43.9% Nemotron Nano 12B 2 VL
nvidia-nemotron-nano-12b-v2-vl
Imported 2026-05-11
372 Qwen3.5 2B (Non-reasoning) 43.8% Imported 2026-05-11
373 Devstral Small (May '25) 43.4% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
374 Gemma 4 E2B (Reasoning) 43.3% Imported 2026-05-11
375 Granite 4.1 8B 43.3% Granite 4.1 8B
ibm-granite-granite-4.1-8b
Imported 2026-05-11
376 Nova Lite 43.3% Nova Lite 1.0
amazon-nova-lite-v1
Imported 2026-05-11
377 Llama 3.2 Instruct 90B (Vision) 43.2% Imported 2026-05-11
378 Gemma 3 27B Instruct 42.8% Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-11
379 GPT-5 nano (minimal) 42.8% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-11
380 Jamba 1.5 Large 42.7% Imported 2026-05-11
381 Qwen3 VL 8B Instruct 42.7% Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-11
382 GPT-4o mini 42.6% GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-11
383 Molmo2-8B 42.5% Imported 2026-05-11
384 Exaone 4.0 1.2B (Non-reasoning) 42.4% Imported 2026-05-11
385 Mistral Saba 42.4% Mistral: Saba
mistralai-mistral-saba
Imported 2026-05-11
386 Qwen2.5 Coder Instruct 32B 41.7% Qwen2.5 Coder 32B Instruct
qwen-qwen-2.5-coder-32b-instruct
Imported 2026-05-11
387 Granite 4.0 H Small 41.6% Imported 2026-05-11
388 Sarvam M (Reasoning) 41.6% Imported 2026-05-11
389 Devstral Small (Jul '25) 41.4% Mistral: Devstral Small 1.1
mistralai-devstral-small
Imported 2026-05-11
390 Kimi Linear 48B A3B Instruct 41.2% Imported 2026-05-11
391 Qwen2.5 Turbo 41% Qwen-Turbo
qwen-qwen-turbo
Imported 2026-05-11
392 Llama 3.1 Instruct 70B 40.9% Imported 2026-05-11
393 Claude 3.5 Haiku 40.8% Claude 3.5 Haiku
anthropic-claude-3.5-haiku
Imported 2026-05-11
394 Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) 40.8% Imported 2026-05-11
395 Gemma 4 E2B (Non-reasoning) 40.5% Imported 2026-05-11
396 DeepSeek R1 Distill Llama 70B 40.2% R1 Distill Llama 70B
deepseek-deepseek-r1-distill-llama-70b
Imported 2026-05-11
397 Hermes 3 - Llama-3.1 70B 40.1% L Hermes 3 70B Instruct
nousresearch-hermes-3-llama-3.1-70b
Imported 2026-05-11
398 Claude 3 Sonnet 40% Imported 2026-05-11
399 Olmo 3 7B Instruct 40% Imported 2026-05-11
400 NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) 39.9% Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-11
401 Qwen3 4B (Non-reasoning) 39.8% Imported 2026-05-11
402 Jamba 1.7 Large 39% Imported 2026-05-11
403 Jamba 1.6 Large 38.7% Imported 2026-05-11
404 DeepHermes 3 - Mistral 24B Preview (Non-reasoning) 38.2% Imported 2026-05-11
405 Mistral Small (Sep '24) 38.1% Imported 2026-05-11
406 Llama 3 Instruct 70B 37.9% Imported 2026-05-11
407 Claude 3 Haiku 37.4% Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-11
408 Gemini 1.5 Pro (May '24) 37.1% Imported 2026-05-11
409 Qwen2 Instruct 72B 37.1% Imported 2026-05-11
410 Qwen3 VL 4B Instruct 37.1% Imported 2026-05-11
411 Gemini 1.5 Flash-8B 35.9% Imported 2026-05-11
412 Ministral 3 3B 35.8% Imported 2026-05-11
413 Nova Micro 35.8% Nova Micro 1.0
amazon-nova-micro-v1
Imported 2026-05-11
414 Qwen3 1.7B (Reasoning) 35.6% Imported 2026-05-11
415 Mistral Large (Feb '24) 35.1% Mistral Large
mistralai-mistral-large
Imported 2026-05-11
416 Gemma 3 12B Instruct 34.9% Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-11
417 Mistral Medium 34.9% Imported 2026-05-11
418 Claude 2.0 34.4% Imported 2026-05-11
419 LFM2 8B A1B 34.4% Imported 2026-05-11
420 LFM2.5-1.2B-Thinking 33.9% LFM LFM2.5-1.2B-Thinking
liquid-lfm-2.5-1.2b-thinking
Imported 2026-05-11
421 Qwen2.5 Coder Instruct 7B 33.9% Imported 2026-05-11
422 Granite 3.3 8B (Non-reasoning) 33.8% Imported 2026-05-11
423 Granite 4.0 Micro 33.6% Granite 4.0 Micro
ibm-granite-granite-4.0-h-micro
Imported 2026-05-11
424 Jamba Reasoning 3B 33.3% Imported 2026-05-11
425 Mixtral 8x22B Instruct 33.2% Mistral: Mixtral 8x22B Instruct
mistralai-mixtral-8x22b-instruct
Imported 2026-05-11
426 DBRX Instruct 33.1% Imported 2026-05-11
427 Phi-4 Mini Instruct 33.1% Imported 2026-05-11
428 Claude Instant 33% Imported 2026-05-11
429 OLMo 2 32B 32.8% Imported 2026-05-11
430 LFM 40B 32.7% Imported 2026-05-11
431 Llama 2 Chat 70B 32.7% Imported 2026-05-11
432 LFM2.5-1.2B-Instruct 32.6% LFM LFM2.5-1.2B-Instruct
liquid-lfm-2.5-1.2b-instruct
Imported 2026-05-11
433 Gemini 1.5 Flash (May '24) 32.4% Imported 2026-05-11
434 Command-R+ (Apr '24) 32.3% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
435 Jamba 1.7 Mini 32.2% Imported 2026-05-11
436 Llama 2 Chat 13B 32.1% Imported 2026-05-11
437 Claude 2.1 31.9% Imported 2026-05-11
438 DeepSeek Coder V2 Lite Instruct 31.9% Imported 2026-05-11
439 Phi-3 Mini Instruct 3.8B 31.9% Imported 2026-05-11
440 Phi-4 Multimodal Instruct 31.5% Imported 2026-05-11
441 Granite 4.1 3B 31.4% Imported 2026-05-11
442 LFM2 2.6B 30.6% Imported 2026-05-11
443 MiniCPM-V 4.6 1.3B 30.5% Imported 2026-05-11
444 Tiny Aya Global 30.5% Imported 2026-05-11
445 DeepSeek R1 Distill Llama 8B 30.2% Imported 2026-05-11
446 Jamba 1.5 Mini 30.2% Imported 2026-05-11
447 Mistral Small (Feb '24) 30.2% Imported 2026-05-11
448 Jamba 1.6 Mini 30% Imported 2026-05-11
449 GPT-3.5 Turbo 29.7% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-11
450 Gemma 3n E4B Instruct 29.6% Imported 2026-05-11
451 Llama 3 Instruct 8B 29.6% Imported 2026-05-11
452 Mixtral 8x7B Instruct 29.2% Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-11
453 Gemma 3 4B Instruct 29.1% Gemma 3 4B
google-gemma-3-4b-it
Imported 2026-05-11
454 LFM2.5-VL-1.6B 28.9% Imported 2026-05-11
455 Qwen1.5 Chat 110B 28.9% Imported 2026-05-11
456 OLMo 2 7B 28.8% Imported 2026-05-11
457 Command-R (Mar '24) 28.4% C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-11
458 Qwen3 1.7B (Non-reasoning) 28.3% Imported 2026-05-11
459 Granite 4.0 1B 28.1% Imported 2026-05-11
460 Gemma 3n E4B Instruct Preview (May '25) 27.8% Imported 2026-05-11
461 Gemini 1.0 Pro 27.7% Imported 2026-05-11
462 Apertus 70B Instruct 27.2% Imported 2026-05-11
463 DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) 27% Imported 2026-05-11
464 Granite 4.0 H 1B 26.3% Imported 2026-05-11
465 Granite 4.0 350M 26.1% Imported 2026-05-11
466 Llama 3.1 Instruct 8B 25.9% Imported 2026-05-11
467 Granite 4.0 H 350M 25.7% Imported 2026-05-11
468 Apertus 8B Instruct 25.6% Imported 2026-05-11
469 Llama 3.2 Instruct 3B 25.5% Imported 2026-05-11
470 Molmo 7B-D 24% Imported 2026-05-11
471 Qwen3 0.6B (Reasoning) 23.9% Imported 2026-05-11
472 Gemma 3 1B Instruct 23.7% Imported 2026-05-11
473 Qwen3.5 0.8B (Non-reasoning) 23.6% Imported 2026-05-11
474 Qwen3 0.6B (Non-reasoning) 23.1% Imported 2026-05-11
475 OpenChat 3.5 (1210) 23% Imported 2026-05-11
476 Gemma 3n E2B Instruct 22.9% Imported 2026-05-11
477 LFM2 1.2B 22.8% Imported 2026-05-11
478 Llama 2 Chat 7B 22.7% Imported 2026-05-11
479 Gemma 3 270M 22.4% Imported 2026-05-11
480 Llama 3.2 Instruct 11B (Vision) 22.1% Imported 2026-05-11
481 Llama 3.2 Instruct 1B 19.6% Imported 2026-05-11
482 Mistral 7B Instruct 17.7% Imported 2026-05-11
483 Qwen3.5 0.8B (Reasoning) 11.1% Imported 2026-05-11
484 DeepSeek R1 Distill Qwen 1.5B 9.8% Imported 2026-05-11
1 GPT-5.4 Pro 94.4% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
2 Gemini 3.1 Pro Preview 94.3% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-23
3 Claude Opus 4.7 94.2% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-23
4 GPT-5.5 93.6% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
5 GPT-5.4 92.8% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23
1 Claude Mythos Preview 94.6% Claude Mythos Preview
anthropic-claude-mythos-preview
Launch post 2026-04-16
2 GPT-5.4 Pro 94.4% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-16
3 Gemini 3.1 Pro Preview 94.3% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Launch post 2026-04-16
4 Claude Opus 4.7 94.2% Claude Opus 4.7
anthropic-claude-opus-4.7
Launch post 2026-04-16
5 Claude Opus 4.6 91.3% Claude Opus 4.6
anthropic-claude-opus-4.6
Launch post 2026-04-16