GPQA Diamond
The hardest GPQA subset of graduate-level science questions in biology, chemistry, and physics.
503rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Accuracy
Showing 5 latest source slices.
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 94.3% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.7 | 94.2% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 3 | Claude Opus 4.8 | 93.6% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 1 | Qwen3.7 Max | 92.4% | Qwen3.7 Max qwen-qwen3.7-max | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.6 Max | 91.3% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-28 |
| 3 | Kimi K2.6 Thinking | 90.5% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-28 |
| 4 | Qwen3.6 Plus | 90.4% | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-28 |
| 5 | DeepSeek V4 Pro Max | 90.1% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Self-reported | 2026-05-28 |
| 6 | GLM-5.1 Thinking | 86.2% | GLM 5.1 z-ai-glm-5.1 | Self-reported | 2026-05-28 |
| 1 | Gemini 3.1 Pro Preview | 94.1% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-11 |
| 2 | GPT-5.5 (xhigh) | 93.5% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 3 | GPT-5.5 (high) | 93.2% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 4 | GPT-5.5 (medium) | 92.6% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 5 | GPT-5.4 (xhigh) | 92% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-11 |
| 6 | GPT-5.3 Codex (xhigh) | 91.5% | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-11 |
| 7 | Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | 91.4% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-11 |
| 8 | Grok 4.20 0309 v2 (Reasoning) | 91.1% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-11 |
| 9 | Kimi K2.6 | 91.1% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-11 |
| 10 | GPT-5.5 (low) | 91% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 11 | Gemini 3 Pro Preview (high) | 90.8% | Gemini 3 google-gemini-3 | Imported | 2026-05-11 |
| 12 | DeepSeek V4 Pro (Reasoning, High Effort) | 90.5% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-11 |
| 13 | GPT-5.2 (xhigh) | 90.3% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 14 | Grok 4.3 | 90.1% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-11 |
| 15 | GPT-5.2 Codex (xhigh) | 89.9% | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-11 |
| 16 | Gemini 3 Flash Preview (Reasoning) | 89.8% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-11 |
| 17 | Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | 89.6% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-11 |
| 18 | DeepSeek V4 Flash (Reasoning, Max Effort) | 89.4% | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-11 |
| 19 | Qwen3.5 397B A17B (Reasoning) | 89.3% | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-11 |
| 20 | DeepSeek V4 Pro (Reasoning, Max Effort) | 88.8% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-11 |
| 21 | Qwen3.6 Max Preview | 88.8% | Qwen3.6 Max Preview qwen-qwen3.6-max-preview | Imported | 2026-05-11 |
| 22 | Gemini 3 Pro Preview (low) | 88.7% | Gemini 3 google-gemini-3 | Imported | 2026-05-11 |
| 23 | Claude Opus 4.7 (Non-reasoning, High Effort) | 88.5% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-11 |
| 24 | Grok 4.20 0309 (Reasoning) | 88.5% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-11 |
| 25 | Muse Spark | 88.4% | — | Imported | 2026-05-11 |
| 26 | Qwen3.6 Plus | 88.2% | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-11 |
| 27 | Kimi K2.5 (Reasoning) | 87.9% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-11 |
| 28 | Grok 4 | 87.7% | Grok 4 x-ai-grok-4 | Imported | 2026-05-11 |
| 29 | Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | 87.5% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-11 |
| 30 | GPT-5.4 mini (xhigh) | 87.5% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-11 |
| 31 | MiniMax-M2.7 | 87.4% | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-11 |
| 32 | GPT-5.1 (high) | 87.3% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-11 |
| 33 | DeepSeek V3.2 Speciale | 87.1% | DeepSeek V3.2 Speciale deepseek-deepseek-v3.2-speciale | Imported | 2026-05-11 |
| 34 | GPT-5.4 (low) | 87.1% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-11 |
| 35 | MiMo-V2-Pro | 87% | MiMo-V2-Pro xiaomi-mimo-v2-pro | Imported | 2026-05-11 |
| 36 | GLM-5.1 (Reasoning) | 86.8% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-11 |
| 37 | DeepSeek V4 Flash (Reasoning, High Effort) | 86.7% | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-11 |
| 38 | Hy3-preview (Reasoning) | 86.7% | Hy3 preview tencent-hy3-preview | Imported | 2026-05-11 |
| 39 | Claude Opus 4.5 (Reasoning) | 86.6% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-11 |
| 40 | MiMo-V2.5-Pro | 86.6% | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-11 |
| 41 | GPT-5.2 (medium) | 86.4% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 42 | Qwen3 Max Thinking | 86.1% | Qwen3 Max Thinking qwen-qwen3-max-thinking | Imported | 2026-05-11 |
| 43 | Qwen3.5 397B A17B (Non-reasoning) | 86.1% | Qwen3.5 397B A17B qwen-qwen3.5-397b-a17b | Imported | 2026-05-11 |
| 44 | GPT-5.1 Codex (high) | 86% | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-11 |
| 45 | GLM-4.7 (Reasoning) | 85.9% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-11 |
| 46 | Qwen3.5 27B (Reasoning) | 85.8% | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-11 |
| 47 | Gemma 4 31B (Reasoning) | 85.7% | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-11 |
| 48 | Qwen3.5 122B A10B (Reasoning) | 85.7% | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-11 |
| 49 | KAT Coder Pro V2 | 85.5% | KAT-Coder-Pro V2 kwaipilot-kat-coder-pro-v2 | Imported | 2026-05-11 |
| 50 | MiMo-V2-Omni-0327 | 85.5% | — | Imported | 2026-05-11 |
| 51 | GPT-5 (high) | 85.4% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 52 | Grok 4.1 Fast (Reasoning) | 85.3% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-11 |
| 53 | MiMo-V2.5 | 84.9% | MiMo-V2.5 xiaomi-mimo-v2.5 | Imported | 2026-05-11 |
| 54 | Nanbeige4.1-3B | 84.9% | — | Imported | 2026-05-11 |
| 55 | MiniMax-M2.5 | 84.8% | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-11 |
| 56 | GLM-5-Turbo | 84.7% | GLM 5 Turbo z-ai-glm-5-turbo | Imported | 2026-05-11 |
| 57 | Grok 4 Fast (Reasoning) | 84.7% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-11 |
| 58 | MiMo-V2-Flash (Reasoning) | 84.6% | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-11 |
| 59 | o3-pro | 84.5% | o3 Pro openai-o3-pro | Imported | 2026-05-11 |
| 60 | Qwen3.5 35B A3B (Reasoning) | 84.5% | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-11 |
| 61 | Gemini 2.5 Pro | 84.4% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-11 |
| 62 | GPT-5 (medium) | 84.2% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 63 | Qwen3.5 27B (Non-reasoning) | 84.2% | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-11 |
| 64 | Qwen3.6 27B (Reasoning) | 84.2% | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-11 |
| 65 | Qwen3.6 35B A3B (Reasoning) | 84.1% | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-11 |
| 66 | Claude Opus 4.6 (Non-reasoning, High Effort) | 84% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-11 |
| 67 | DeepSeek V3.2 (Reasoning) | 84% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 68 | GLM-5.1 (Non-reasoning) | 83.9% | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-11 |
| 69 | Kimi K2 Thinking | 83.8% | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-11 |
| 70 | GPT-5 Codex (high) | 83.7% | GPT-5 Codex openai-gpt-5-codex | Imported | 2026-05-11 |
| 71 | Gemini 2.5 Pro Preview (Mar' 25) | 83.6% | Gemini 2.5 Pro Preview 06-05 google-gemini-2.5-pro-preview | Imported | 2026-05-11 |
| 72 | MiMo-V2-Flash (Feb 2026) | 83.5% | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-11 |
| 73 | Claude 4.5 Sonnet (Reasoning) | 83.4% | — | Imported | 2026-05-11 |
| 74 | Step 3.5 Flash | 83.1% | Step 3.5 Flash stepfun-step-3.5-flash | Imported | 2026-05-11 |
| 75 | MiniMax-M2.1 | 83% | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-11 |
| 76 | Qwen3.6 27B (Non-reasoning) | 82.9% | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-11 |
| 77 | GPT-5 mini (high) | 82.8% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 78 | MiMo-V2-Omni | 82.8% | MiMo-V2-Omni xiaomi-mimo-v2-omni | Imported | 2026-05-11 |
| 79 | o3 | 82.7% | o3 openai-o3 | Imported | 2026-05-11 |
| 80 | Qwen3.5 122B A10B (Non-reasoning) | 82.7% | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-11 |
| 81 | Qwen3.5 Omni Plus | 82.6% | — | Imported | 2026-05-11 |
| 82 | Step 3.5 Flash 2603 | 82.6% | Step 3.5 Flash stepfun-step-3.5-flash | Imported | 2026-05-11 |
| 83 | GPT-5.4 mini (medium) | 82.3% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-11 |
| 84 | Gemini 2.5 Pro Preview (May' 25) | 82.2% | Gemini 2.5 Pro Preview 06-05 google-gemini-2.5-pro-preview | Imported | 2026-05-11 |
| 85 | Gemini 3.1 Flash-Lite Preview | 82.2% | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Imported | 2026-05-11 |
| 86 | GLM-5 (Reasoning) | 82% | GLM 5 z-ai-glm-5 | Imported | 2026-05-11 |
| 87 | Qwen3.5 35B A3B (Non-reasoning) | 81.9% | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-11 |
| 88 | GPT-5.4 nano (xhigh) | 81.7% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-11 |
| 89 | Qwen3.6 35B A3B (Non-reasoning) | 81.7% | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-11 |
| 90 | DeepSeek R1 0528 (May '25) | 81.3% | R1 deepseek-r1 | Imported | 2026-05-11 |
| 91 | GPT-5.1 Codex mini (high) | 81.3% | GPT-5.1-Codex-Mini openai-gpt-5.1-codex-mini | Imported | 2026-05-11 |
| 92 | Gemini 3 Flash Preview (Non-reasoning) | 81.2% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-11 |
| 93 | ERNIE 4.5 300B A47B | 81.1% | ERNIE 4.5 300B A47B baidu-ernie-4.5-300b-a47b | Imported | 2026-05-11 |
| 94 | Nova 2.0 Lite (high) | 81.1% | — | Imported | 2026-05-11 |
| 95 | Claude Opus 4.5 (Non-reasoning) | 81% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-11 |
| 96 | Claude 4.1 Opus (Reasoning) | 80.9% | — | Imported | 2026-05-11 |
| 97 | GLM 5V Turbo (Reasoning) | 80.9% | GLM 5V Turbo z-ai-glm-5v-turbo | Imported | 2026-05-11 |
| 98 | GPT-5 (low) | 80.8% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 99 | Qwen3.5 9B (Reasoning) | 80.6% | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-11 |
| 100 | GPT-5 mini (medium) | 80.3% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 101 | NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | 80% | Nemotron 3 Super nvidia-nemotron-3-super-120b-a12b | Imported | 2026-05-11 |
| 102 | Claude Sonnet 4.6 (Non-reasoning, High Effort) | 79.9% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-11 |
| 103 | Claude Sonnet 4.6 (Non-reasoning, Low Effort) | 79.7% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-11 |
| 104 | DeepSeek V3.2 Exp (Reasoning) | 79.7% | DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp | Imported | 2026-05-11 |
| 105 | Claude 4 Opus (Reasoning) | 79.6% | — | Imported | 2026-05-11 |
| 106 | EXAONE 4.5 33B | 79.4% | — | Imported | 2026-05-11 |
| 107 | Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | 79.3% | — | Imported | 2026-05-11 |
| 108 | DeepSeek V3.1 Terminus (Reasoning) | 79.2% | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-11 |
| 109 | Gemma 4 26B A4B (Reasoning) | 79.2% | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-11 |
| 110 | Grok 3 mini Reasoning (high) | 79.1% | — | Imported | 2026-05-11 |
| 111 | Gemini 2.5 Flash (Reasoning) | 79% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-11 |
| 112 | Qwen3 235B A22B 2507 (Reasoning) | 79% | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-11 |
| 113 | Kimi K2.5 (Non-reasoning) | 78.9% | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-11 |
| 114 | Kimi K2.6 (Non-reasoning) | 78.8% | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-11 |
| 115 | Qwen3.5 9B (Non-reasoning) | 78.6% | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-11 |
| 116 | Grok 4.20 0309 (Non-reasoning) | 78.5% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-11 |
| 117 | Nova 2.0 Pro Preview (medium) | 78.5% | — | Imported | 2026-05-11 |
| 118 | o4-mini (high) | 78.4% | o4 Mini openai-o4-mini | Imported | 2026-05-11 |
| 119 | K-EXAONE (Reasoning) | 78.3% | — | Imported | 2026-05-11 |
| 120 | GLM-4.5 (Reasoning) | 78.2% | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-11 |
| 121 | gpt-oss-120B (high) | 78.2% | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-11 |
| 122 | GLM-4.6 (Reasoning) | 78% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-11 |
| 123 | DeepSeek V3.1 (Reasoning) | 77.9% | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-11 |
| 124 | Claude 4 Sonnet (Reasoning) | 77.7% | — | Imported | 2026-05-11 |
| 125 | ERNIE 5.0 Thinking Preview | 77.7% | — | Imported | 2026-05-11 |
| 126 | MiniMax-M2 | 77.7% | MiniMax M2 minimax-minimax-m2 | Imported | 2026-05-11 |
| 127 | Grok 4.20 0309 v2 (Non-reasoning) | 77.6% | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-11 |
| 128 | Qwen3 Max Thinking (Preview) | 77.6% | Qwen3 Max Thinking qwen-qwen3-max-thinking | Imported | 2026-05-11 |
| 129 | Ring-1T | 77.4% | — | Imported | 2026-05-11 |
| 130 | o3-mini (high) | 77.3% | o3 Mini High openai-o3-mini-high | Imported | 2026-05-11 |
| 131 | Claude 3.7 Sonnet (Reasoning) | 77.2% | Claude 3.7 Sonnet (thinking) anthropic-claude-3.7-sonnet-thinking | Imported | 2026-05-11 |
| 132 | Qwen3 VL 235B A22B (Reasoning) | 77.2% | — | Imported | 2026-05-11 |
| 133 | Qwen3.5 4B (Reasoning) | 77.1% | — | Imported | 2026-05-11 |
| 134 | Mercury 2 | 77% | Mercury 2 inception-mercury-2 | Imported | 2026-05-11 |
| 135 | Mistral Small 4 (Reasoning) | 76.9% | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-11 |
| 136 | Cogito v2.1 (Reasoning) | 76.8% | — | Imported | 2026-05-11 |
| 137 | GPT-5.5 (Non-reasoning) | 76.8% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-11 |
| 138 | Nova 2.0 Lite (medium) | 76.8% | — | Imported | 2026-05-11 |
| 139 | Kimi K2 0905 | 76.7% | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-11 |
| 140 | Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | 76.6% | — | Imported | 2026-05-11 |
| 141 | Kimi K2 | 76.6% | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-11 |
| 142 | Doubao Seed Code | 76.4% | — | Imported | 2026-05-11 |
| 143 | KAT-Coder-Pro V1 | 76.4% | — | Imported | 2026-05-11 |
| 144 | Qwen3 Max | 76.4% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-11 |
| 145 | Qwen3 Max (Preview) | 76.4% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-11 |
| 146 | Gemma 4 31B (Non-reasoning) | 76.3% | Gemma 4 31B google-gemma-4-31b-it | Imported | 2026-05-11 |
| 147 | MiMo-V2.5-Pro (Non-reasoning) | 76.2% | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-11 |
| 148 | GPT-5.4 nano (medium) | 76.1% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-11 |
| 149 | INTELLECT-3 | 76.1% | INTELLECT-3 prime-intellect-intellect-3 | Imported | 2026-05-11 |
| 150 | Nova 2.0 Omni (medium) | 76% | — | Imported | 2026-05-11 |
| 151 | Qwen3 Next 80B A3B (Reasoning) | 75.9% | — | Imported | 2026-05-11 |
| 152 | Nemotron Cascade 2 30B A3B | 75.8% | — | Imported | 2026-05-11 |
| 153 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 75.7% | Nemotron 3 Nano 30B A3B nvidia-nemotron-3-nano-30b-a3b | Imported | 2026-05-11 |
| 154 | Qwen3 235B A22B 2507 Instruct | 75.3% | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-11 |
| 155 | Ling-2.6-1T | 75.2% | Ling-2.6-1T inclusionai-ling-2.6-1t | Imported | 2026-05-11 |
| 156 | Trinity Large Thinking | 75.2% | Trinity Large Thinking arcee-ai-trinity-large-thinking | Imported | 2026-05-11 |
| 157 | DeepSeek V3.1 Terminus (Non-reasoning) | 75.1% | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-11 |
| 158 | DeepSeek V3.2 (Non-reasoning) | 75.1% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 159 | Nova 2.0 Pro Preview (low) | 75.1% | — | Imported | 2026-05-11 |
| 160 | GPT-5.4 (Non-reasoning) | 74.8% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-11 |
| 161 | Llama Nemotron Super 49B v1.5 (Reasoning) | 74.8% | — | Imported | 2026-05-11 |
| 162 | Mistral Medium 3.5 | 74.8% | Mistral: Mistral Medium 3.5 mistralai-mistral-medium-3-5 | Imported | 2026-05-11 |
| 163 | o3-mini | 74.8% | o3-mini openai-o3-mini | Imported | 2026-05-11 |
| 164 | o1 | 74.7% | o1 openai-o1 | Imported | 2026-05-11 |
| 165 | Qwen3.5 Omni Flash | 74.2% | — | Imported | 2026-05-11 |
| 166 | EXAONE 4.0 32B (Reasoning) | 73.9% | — | Imported | 2026-05-11 |
| 167 | Magistral Medium 1.2 | 73.9% | — | Imported | 2026-05-11 |
| 168 | DeepSeek V3.2 Exp (Non-reasoning) | 73.8% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 169 | Qwen3 Next 80B A3B Instruct | 73.8% | Qwen3 Next 80B A3B Instruct qwen-qwen3-next-80b-a3b-instruct | Imported | 2026-05-11 |
| 170 | Sarvam 105B (high) | 73.8% | — | Imported | 2026-05-11 |
| 171 | Qwen3 Coder Next | 73.7% | Qwen3 Coder Next qwen-qwen3-coder-next | Imported | 2026-05-11 |
| 172 | DeepSeek V3.1 (Non-reasoning) | 73.5% | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-11 |
| 173 | Apriel-v1.6-15B-Thinker | 73.3% | — | Imported | 2026-05-11 |
| 174 | GLM-4.5-Air | 73.3% | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-11 |
| 175 | Qwen3 VL 32B (Reasoning) | 73.3% | — | Imported | 2026-05-11 |
| 176 | Hy3-preview (Non-reasoning) | 73.2% | Hy3 preview tencent-hy3-preview | Imported | 2026-05-11 |
| 177 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 72.8% | — | Imported | 2026-05-11 |
| 178 | Claude 4.5 Sonnet (Non-reasoning) | 72.7% | — | Imported | 2026-05-11 |
| 179 | Grok Code Fast 1 | 72.7% | Grok Code Fast 1 x-ai-grok-code-fast-1 | Imported | 2026-05-11 |
| 180 | Hermes 4 - Llama-3.1 405B (Reasoning) | 72.7% | — | Imported | 2026-05-11 |
| 181 | Qwen3 Omni 30B A3B (Reasoning) | 72.6% | — | Imported | 2026-05-11 |
| 182 | Seed-OSS-36B-Instruct | 72.6% | — | Imported | 2026-05-11 |
| 183 | Ring-flash-2.0 | 72.5% | — | Imported | 2026-05-11 |
| 184 | Solar Pro 3 | 72.4% | Solar Pro 3 upstage-solar-pro-3 | Imported | 2026-05-11 |
| 185 | Mi:dm K 2.5 Pro Preview | 72.2% | — | Imported | 2026-05-11 |
| 186 | Qwen3 VL 30B A3B (Reasoning) | 72% | — | Imported | 2026-05-11 |
| 187 | GLM-4.6V (Reasoning) | 71.9% | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-11 |
| 188 | Ling-1T | 71.9% | — | Imported | 2026-05-11 |
| 189 | DeepSeek V4 Pro (Non-reasoning) | 71.7% | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-11 |
| 190 | DeepSeek V4 Flash (Non-reasoning) | 71.6% | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-11 |
| 191 | Gemma 4 26B A4B (Non-reasoning) | 71.4% | Gemma 4 26B A4B google-gemma-4-26b-a4b-it | Imported | 2026-05-11 |
| 192 | Apriel-v1.5-15B-Thinker | 71.3% | — | Imported | 2026-05-11 |
| 193 | K2 Think V2 | 71.3% | — | Imported | 2026-05-11 |
| 194 | GPT-5.2 (Non-reasoning) | 71.2% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 195 | Qwen3 VL 235B A22B Instruct | 71.2% | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-11 |
| 196 | Qwen3.5 4B (Non-reasoning) | 71.2% | — | Imported | 2026-05-11 |
| 197 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 70.9% | — | Imported | 2026-05-11 |
| 198 | DeepSeek R1 (Jan '25) | 70.8% | R1 deepseek-r1 | Imported | 2026-05-11 |
| 199 | Qwen3 30B A3B 2507 (Reasoning) | 70.7% | — | Imported | 2026-05-11 |
| 200 | Claude 4 Opus (Non-reasoning) | 70.1% | — | Imported | 2026-05-11 |
| 201 | Gemini 2.0 Flash Thinking Experimental (Jan '25) | 70.1% | — | Imported | 2026-05-11 |
| 202 | Mi:dm K 2.5 Pro | 70.1% | — | Imported | 2026-05-11 |
| 203 | Qwen3 235B A22B (Reasoning) | 70% | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-11 |
| 204 | Hermes 4 - Llama-3.1 70B (Reasoning) | 69.9% | — | Imported | 2026-05-11 |
| 205 | Nova 2.0 Omni (low) | 69.9% | — | Imported | 2026-05-11 |
| 206 | Gemini 2.5 Flash Preview (Reasoning) | 69.8% | — | Imported | 2026-05-11 |
| 207 | Nova 2.0 Lite (low) | 69.8% | — | Imported | 2026-05-11 |
| 208 | MiniMax M1 80k | 69.7% | — | Imported | 2026-05-11 |
| 209 | K-EXAONE (Non-reasoning) | 69.5% | — | Imported | 2026-05-11 |
| 210 | Motif-2-12.7B-Reasoning | 69.5% | — | Imported | 2026-05-11 |
| 211 | Qwen3 VL 30B A3B Instruct | 69.5% | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Imported | 2026-05-11 |
| 212 | Grok 3 | 69.3% | Grok 3 xaigrok-3 | Imported | 2026-05-11 |
| 213 | Step3 VL 10B | 69% | — | Imported | 2026-05-11 |
| 214 | gpt-oss-20B (high) | 68.8% | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-11 |
| 215 | GPT-5 mini (minimal) | 68.7% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 216 | Solar Pro 2 (Reasoning) | 68.7% | — | Imported | 2026-05-11 |
| 217 | GPT-5 (ChatGPT) | 68.6% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 218 | GLM-4.5V (Reasoning) | 68.4% | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-11 |
| 219 | Claude 4 Sonnet (Non-reasoning) | 68.3% | — | Imported | 2026-05-11 |
| 220 | Gemini 2.5 Flash (Non-reasoning) | 68.3% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-11 |
| 221 | MiniMax M1 40k | 68.2% | — | Imported | 2026-05-11 |
| 222 | K2-V2 (high) | 68.1% | — | Imported | 2026-05-11 |
| 223 | Mistral Large 3 | 68% | — | Imported | 2026-05-11 |
| 224 | Magistral Medium 1 | 67.9% | — | Imported | 2026-05-11 |
| 225 | GPT-5 nano (high) | 67.6% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 226 | JT-MINI | 67.6% | — | Imported | 2026-05-11 |
| 227 | GPT-5 (minimal) | 67.3% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 228 | Claude 4.5 Haiku (Reasoning) | 67.2% | — | Imported | 2026-05-11 |
| 229 | gpt-oss-120B (low) | 67.2% | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-11 |
| 230 | Llama 4 Maverick | 67.1% | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-11 |
| 231 | Qwen3 VL 32B Instruct | 67.1% | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Imported | 2026-05-11 |
| 232 | GPT-5 nano (medium) | 67% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 233 | Qwen3 32B (Reasoning) | 66.8% | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-11 |
| 234 | Qwen3 4B 2507 (Reasoning) | 66.7% | — | Imported | 2026-05-11 |
| 235 | GLM-5 (Non-reasoning) | 66.6% | GLM 5 z-ai-glm-5 | Imported | 2026-05-11 |
| 236 | GPT-4.1 | 66.6% | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-11 |
| 237 | GLM-4.7 (Non-reasoning) | 66.4% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-11 |
| 238 | GPT-4.1 mini | 66.4% | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-11 |
| 239 | Magistral Small 1.2 | 66.3% | — | Imported | 2026-05-11 |
| 240 | Falcon-H1R-7B | 66.1% | — | Imported | 2026-05-11 |
| 241 | Qwen3 30B A3B 2507 Instruct | 65.9% | — | Imported | 2026-05-11 |
| 242 | Grok 4.3 (Non-reasoning) | 65.8% | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-11 |
| 243 | Ling-flash-2.0 | 65.7% | — | Imported | 2026-05-11 |
| 244 | Solar Open 100B (Reasoning) | 65.7% | — | Imported | 2026-05-11 |
| 245 | Claude 3.7 Sonnet (Non-reasoning) | 65.6% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-11 |
| 246 | MiMo-V2-Flash (Non-reasoning) | 65.6% | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-11 |
| 247 | DeepSeek V3 0324 | 65.5% | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-11 |
| 248 | GPT-4o (March 2025, chatgpt-4o-latest) | 65.5% | GPT-4o openai-gpt-4o | Imported | 2026-05-11 |
| 249 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | 65.1% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-11 |
| 250 | Claude 4.5 Haiku (Non-reasoning) | 64.6% | — | Imported | 2026-05-11 |
| 251 | GPT-5.1 (Non-reasoning) | 64.3% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-11 |
| 252 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 64.3% | — | Imported | 2026-05-11 |
| 253 | Magistral Small 1 | 64.1% | — | Imported | 2026-05-11 |
| 254 | Grok 4.1 Fast (Non-reasoning) | 63.7% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-11 |
| 255 | Gemini 2.0 Flash (experimental) | 63.6% | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-11 |
| 256 | LongCat Flash Lite | 63.6% | — | Imported | 2026-05-11 |
| 257 | Nova 2.0 Pro Preview (Non-reasoning) | 63.6% | — | Imported | 2026-05-11 |
| 258 | Sarvam 30B (high) | 63.3% | — | Imported | 2026-05-11 |
| 259 | GLM-4.6 (Non-reasoning) | 63.2% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-11 |
| 260 | EXAONE 4.0 32B (Non-reasoning) | 62.8% | — | Imported | 2026-05-11 |
| 261 | Gemini 2.5 Flash-Lite (Reasoning) | 62.5% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-11 |
| 262 | Gemini 2.0 Flash (Feb '25) | 62.3% | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-11 |
| 263 | Sonar Reasoning | 62.3% | — | Imported | 2026-05-11 |
| 264 | Gemini 2.0 Pro Experimental (Feb '25) | 62.2% | — | Imported | 2026-05-11 |
| 265 | Qwen3 Omni 30B A3B Instruct | 62% | — | Imported | 2026-05-11 |
| 266 | Qwen3 Coder 480B A35B Instruct | 61.8% | Qwen3 Coder 480B A35B qwen-qwen3-coder | Imported | 2026-05-11 |
| 267 | Qwen3 30B A3B (Reasoning) | 61.6% | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-11 |
| 268 | DeepSeek R1 Distill Qwen 32B | 61.5% | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Imported | 2026-05-11 |
| 269 | HyperCLOVA X SEED Think (32B) | 61.5% | — | Imported | 2026-05-11 |
| 270 | Qwen3 235B A22B (Non-reasoning) | 61.3% | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-11 |
| 271 | DeepSeek R1 0528 Qwen3 8B | 61.2% | — | Imported | 2026-05-11 |
| 272 | gpt-oss-20B (low) | 61.1% | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-11 |
| 273 | Olmo 3 32B Think | 61% | Olmo 3 32B Think allenai-olmo-3-32b-think | Imported | 2026-05-11 |
| 274 | GPT-5.4 mini (Non-Reasoning) | 60.6% | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-11 |
| 275 | Grok 4 Fast (Non-reasoning) | 60.6% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-11 |
| 276 | Qwen3 14B (Reasoning) | 60.4% | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-11 |
| 277 | Nova 2.0 Lite (Non-reasoning) | 60.3% | — | Imported | 2026-05-11 |
| 278 | o1-mini | 60.3% | — | Imported | 2026-05-11 |
| 279 | Tri-21B-Think | 60.1% | — | Imported | 2026-05-11 |
| 280 | Claude 3.5 Sonnet (Oct '24) | 59.9% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-11 |
| 281 | K2-V2 (medium) | 59.8% | — | Imported | 2026-05-11 |
| 282 | Devstral 2 | 59.4% | — | Imported | 2026-05-11 |
| 283 | Gemini 2.5 Flash Preview (Non-reasoning) | 59.4% | — | Imported | 2026-05-11 |
| 284 | Ling 2.6 Flash | 59.3% | Ling-2.6-flash inclusionai-ling-2.6-flash | Imported | 2026-05-11 |
| 285 | QwQ 32B | 59.3% | — | Imported | 2026-05-11 |
| 286 | Olmo 3.1 32B Think | 59.1% | — | Imported | 2026-05-11 |
| 287 | Gemini 1.5 Pro (Sep '24) | 58.9% | — | Imported | 2026-05-11 |
| 288 | Qwen3 8B (Reasoning) | 58.9% | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-11 |
| 289 | Mistral Medium 3.1 | 58.8% | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-11 |
| 290 | Llama 4 Scout | 58.7% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-11 |
| 291 | Qwen2.5 Max | 58.7% | — | Imported | 2026-05-11 |
| 292 | GLM-4.7-Flash (Reasoning) | 58.1% | GLM 4.7 Flash z-ai-glm-4.7-flash | Imported | 2026-05-11 |
| 293 | Qwen3 VL 8B (Reasoning) | 57.9% | — | Imported | 2026-05-11 |
| 294 | Mistral Medium 3 | 57.8% | Mistral: Mistral Medium 3 mistralai-mistral-medium-3 | Imported | 2026-05-11 |
| 295 | Solar Pro 2 (Preview) (Reasoning) | 57.8% | — | Imported | 2026-05-11 |
| 296 | Sonar Pro | 57.8% | Sonar Pro perplexity-sonar-pro | Imported | 2026-05-11 |
| 297 | Gemma 4 E4B (Reasoning) | 57.6% | — | Imported | 2026-05-11 |
| 298 | Phi-4 | 57.5% | Phi 4 microsoft-phi-4 | Imported | 2026-05-11 |
| 299 | GLM-4.5V (Non-reasoning) | 57.3% | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-11 |
| 300 | Ministral 3 14B | 57.2% | — | Imported | 2026-05-11 |
| 301 | NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 57.2% | Nemotron Nano 12B 2 VL nvidia-nemotron-nano-12b-v2-vl | Imported | 2026-05-11 |
| 302 | Mistral Small 4 (Non-reasoning) | 57.1% | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-11 |
| 303 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | 57% | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Imported | 2026-05-11 |
| 304 | Nova Premier | 56.9% | — | Imported | 2026-05-11 |
| 305 | GLM-4.6V (Non-reasoning) | 56.6% | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-11 |
| 306 | Ling-mini-2.0 | 56.2% | — | Imported | 2026-05-11 |
| 307 | Solar Pro 2 (Non-reasoning) | 56.1% | — | Imported | 2026-05-11 |
| 308 | Claude 3.5 Sonnet (June '24) | 56% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-11 |
| 309 | GPT-5.4 nano (Non-Reasoning) | 55.8% | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-11 |
| 310 | DeepSeek V3 (Dec '24) | 55.7% | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-11 |
| 311 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 55.7% | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Imported | 2026-05-11 |
| 312 | QwQ 32B-Preview | 55.7% | — | Imported | 2026-05-11 |
| 313 | Nova 2.0 Omni (Non-reasoning) | 55.5% | — | Imported | 2026-05-11 |
| 314 | Gemma 4 E4B (Non-reasoning) | 54.9% | — | Imported | 2026-05-11 |
| 315 | Solar Pro 2 (Preview) (Non-reasoning) | 54.4% | — | Imported | 2026-05-11 |
| 316 | GPT-4o (Nov '24) | 54.3% | GPT-4o openai-gpt-4o | Imported | 2026-05-11 |
| 317 | Gemini 2.0 Flash-Lite (Preview) | 54.2% | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Imported | 2026-05-11 |
| 318 | K2-V2 (low) | 54.1% | — | Imported | 2026-05-11 |
| 319 | Olmo 3.1 32B Instruct | 53.9% | Olmo 3.1 32B Instruct allenai-olmo-3.1-32b-instruct | Imported | 2026-05-11 |
| 320 | Tri-21B-think Preview | 53.8% | — | Imported | 2026-05-11 |
| 321 | Hermes 4 - Llama-3.1 405B (Non-reasoning) | 53.6% | — | Imported | 2026-05-11 |
| 322 | Gemini 2.0 Flash-Lite (Feb '25) | 53.5% | Gemini 2.0 Flash Lite google-gemini-2.0-flash-lite-001 | Imported | 2026-05-11 |
| 323 | Qwen3 32B (Non-reasoning) | 53.5% | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-11 |
| 324 | Devstral Small 2 | 53.2% | — | Imported | 2026-05-11 |
| 325 | Reka Flash 3 | 52.9% | Reka Flash 3 rekaai-reka-flash-3 | Imported | 2026-05-11 |
| 326 | Command A | 52.7% | Command A cohere-command-a | Imported | 2026-05-11 |
| 327 | GPT-4o (May '24) | 52.6% | GPT-4o (2024-05-13) openai-gpt-4o-2024-05-13 | Imported | 2026-05-11 |
| 328 | Qwen3 4B (Reasoning) | 52.2% | — | Imported | 2026-05-11 |
| 329 | GPT-4o (Aug '24) | 52.1% | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Imported | 2026-05-11 |
| 330 | Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 51.7% | — | Imported | 2026-05-11 |
| 331 | Qwen3 4B 2507 Instruct | 51.7% | — | Imported | 2026-05-11 |
| 332 | Llama 3.1 Tulu3 405B | 51.6% | — | Imported | 2026-05-11 |
| 333 | Olmo 3 7B Think | 51.6% | — | Imported | 2026-05-11 |
| 334 | Qwen3 Coder 30B A3B Instruct | 51.6% | Qwen3 Coder 30B A3B Instruct qwen-qwen3-coder-30b-a3b-instruct | Imported | 2026-05-11 |
| 335 | Exaone 4.0 1.2B (Reasoning) | 51.5% | — | Imported | 2026-05-11 |
| 336 | Llama 3.1 Instruct 405B | 51.5% | — | Imported | 2026-05-11 |
| 337 | Qwen3 30B A3B (Non-reasoning) | 51.5% | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-11 |
| 338 | NVIDIA Nemotron 3 Nano 4B | 51.3% | — | Imported | 2026-05-11 |
| 339 | GPT-4.1 nano | 51.2% | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-11 |
| 340 | GPT-4o (ChatGPT) | 51.1% | GPT-4o openai-gpt-4o | Imported | 2026-05-11 |
| 341 | Grok 2 (Dec '24) | 51% | — | Imported | 2026-05-11 |
| 342 | Mistral Small 3.2 | 50.5% | — | Imported | 2026-05-11 |
| 343 | Pixtral Large | 50.5% | Mistral: Pixtral Large 2411 mistralai-pixtral-large-2411 | Imported | 2026-05-11 |
| 344 | Nova Pro | 49.9% | Nova Pro 1.0 amazon-nova-pro-v1 | Imported | 2026-05-11 |
| 345 | Llama 3.3 Instruct 70B | 49.8% | — | Imported | 2026-05-11 |
| 346 | Qwen3 VL 4B (Reasoning) | 49.4% | — | Imported | 2026-05-11 |
| 347 | Devstral Medium | 49.2% | Mistral: Devstral Medium mistralai-devstral-medium | Imported | 2026-05-11 |
| 348 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | 49.1% | — | Imported | 2026-05-11 |
| 349 | Qwen2.5 Instruct 72B | 49.1% | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-11 |
| 350 | Claude 3 Opus | 48.9% | — | Imported | 2026-05-11 |
| 351 | Mistral Large 2 (Nov '24) | 48.6% | — | Imported | 2026-05-11 |
| 352 | DeepSeek R1 Distill Qwen 14B | 48.4% | — | Imported | 2026-05-11 |
| 353 | Granite 4.1 30B | 48.1% | — | Imported | 2026-05-11 |
| 354 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | 48.1% | — | Imported | 2026-05-11 |
| 355 | Gemini 2.5 Flash-Lite (Non-reasoning) | 47.4% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-11 |
| 356 | LFM2 24B A2B | 47.4% | LFM2-24B-A2B liquid-lfm-2-24b-a2b | Imported | 2026-05-11 |
| 357 | Mistral Large 2 (Jul '24) | 47.2% | Mistral Large 2407 mistralai-mistral-large-2407 | Imported | 2026-05-11 |
| 358 | Grok Beta | 47.1% | — | Imported | 2026-05-11 |
| 359 | Ministral 3 8B | 47.1% | — | Imported | 2026-05-11 |
| 360 | Sonar | 47.1% | Sonar perplexity-sonar | Imported | 2026-05-11 |
| 361 | Qwen3 14B (Non-reasoning) | 47% | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-11 |
| 362 | Nemotron 3 Nano Omni 30B A3B Reasoning | 46.9% | — | Imported | 2026-05-11 |
| 363 | Qwen2.5 Instruct 32B | 46.6% | — | Imported | 2026-05-11 |
| 364 | Llama 3.1 Nemotron Instruct 70B | 46.5% | — | Imported | 2026-05-11 |
| 365 | Gemini 1.5 Flash (Sep '24) | 46.3% | — | Imported | 2026-05-11 |
| 366 | Mistral Small 3 | 46.2% | — | Imported | 2026-05-11 |
| 367 | Qwen3.5 2B (Reasoning) | 45.6% | — | Imported | 2026-05-11 |
| 368 | Mistral Small 3.1 | 45.4% | — | Imported | 2026-05-11 |
| 369 | GLM-4.7-Flash (Non-reasoning) | 45.2% | GLM 4.7 Flash z-ai-glm-4.7-flash | Imported | 2026-05-11 |
| 370 | Qwen3 8B (Non-reasoning) | 45.2% | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-11 |
| 371 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 43.9% | Nemotron Nano 12B 2 VL nvidia-nemotron-nano-12b-v2-vl | Imported | 2026-05-11 |
| 372 | Qwen3.5 2B (Non-reasoning) | 43.8% | — | Imported | 2026-05-11 |
| 373 | Devstral Small (May '25) | 43.4% | Mistral: Devstral Small 1.1 mistralai-devstral-small | Imported | 2026-05-11 |
| 374 | Gemma 4 E2B (Reasoning) | 43.3% | — | Imported | 2026-05-11 |
| 375 | Granite 4.1 8B | 43.3% | Granite 4.1 8B ibm-granite-granite-4.1-8b | Imported | 2026-05-11 |
| 376 | Nova Lite | 43.3% | Nova Lite 1.0 amazon-nova-lite-v1 | Imported | 2026-05-11 |
| 377 | Llama 3.2 Instruct 90B (Vision) | 43.2% | — | Imported | 2026-05-11 |
| 378 | Gemma 3 27B Instruct | 42.8% | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-11 |
| 379 | GPT-5 nano (minimal) | 42.8% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 380 | Jamba 1.5 Large | 42.7% | — | Imported | 2026-05-11 |
| 381 | Qwen3 VL 8B Instruct | 42.7% | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-11 |
| 382 | GPT-4o mini | 42.6% | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-11 |
| 383 | Molmo2-8B | 42.5% | — | Imported | 2026-05-11 |
| 384 | Exaone 4.0 1.2B (Non-reasoning) | 42.4% | — | Imported | 2026-05-11 |
| 385 | Mistral Saba | 42.4% | Mistral: Saba mistralai-mistral-saba | Imported | 2026-05-11 |
| 386 | Qwen2.5 Coder Instruct 32B | 41.7% | Qwen2.5 Coder 32B Instruct qwen-qwen-2.5-coder-32b-instruct | Imported | 2026-05-11 |
| 387 | Granite 4.0 H Small | 41.6% | — | Imported | 2026-05-11 |
| 388 | Sarvam M (Reasoning) | 41.6% | — | Imported | 2026-05-11 |
| 389 | Devstral Small (Jul '25) | 41.4% | Mistral: Devstral Small 1.1 mistralai-devstral-small | Imported | 2026-05-11 |
| 390 | Kimi Linear 48B A3B Instruct | 41.2% | — | Imported | 2026-05-11 |
| 391 | Qwen2.5 Turbo | 41% | Qwen-Turbo qwen-qwen-turbo | Imported | 2026-05-11 |
| 392 | Llama 3.1 Instruct 70B | 40.9% | — | Imported | 2026-05-11 |
| 393 | Claude 3.5 Haiku | 40.8% | Claude 3.5 Haiku anthropic-claude-3.5-haiku | Imported | 2026-05-11 |
| 394 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 40.8% | — | Imported | 2026-05-11 |
| 395 | Gemma 4 E2B (Non-reasoning) | 40.5% | — | Imported | 2026-05-11 |
| 396 | DeepSeek R1 Distill Llama 70B | 40.2% | R1 Distill Llama 70B deepseek-deepseek-r1-distill-llama-70b | Imported | 2026-05-11 |
| 397 | Hermes 3 - Llama-3.1 70B | 40.1% | Hermes 3 70B Instruct nousresearch-hermes-3-llama-3.1-70b | Imported | 2026-05-11 |
| 398 | Claude 3 Sonnet | 40% | — | Imported | 2026-05-11 |
| 399 | Olmo 3 7B Instruct | 40% | — | Imported | 2026-05-11 |
| 400 | NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 39.9% | Nemotron 3 Nano 30B A3B nvidia-nemotron-3-nano-30b-a3b | Imported | 2026-05-11 |
| 401 | Qwen3 4B (Non-reasoning) | 39.8% | — | Imported | 2026-05-11 |
| 402 | Jamba 1.7 Large | 39% | — | Imported | 2026-05-11 |
| 403 | Jamba 1.6 Large | 38.7% | — | Imported | 2026-05-11 |
| 404 | DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | 38.2% | — | Imported | 2026-05-11 |
| 405 | Mistral Small (Sep '24) | 38.1% | — | Imported | 2026-05-11 |
| 406 | Llama 3 Instruct 70B | 37.9% | — | Imported | 2026-05-11 |
| 407 | Claude 3 Haiku | 37.4% | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-11 |
| 408 | Gemini 1.5 Pro (May '24) | 37.1% | — | Imported | 2026-05-11 |
| 409 | Qwen2 Instruct 72B | 37.1% | — | Imported | 2026-05-11 |
| 410 | Qwen3 VL 4B Instruct | 37.1% | — | Imported | 2026-05-11 |
| 411 | Gemini 1.5 Flash-8B | 35.9% | — | Imported | 2026-05-11 |
| 412 | Ministral 3 3B | 35.8% | — | Imported | 2026-05-11 |
| 413 | Nova Micro | 35.8% | Nova Micro 1.0 amazon-nova-micro-v1 | Imported | 2026-05-11 |
| 414 | Qwen3 1.7B (Reasoning) | 35.6% | — | Imported | 2026-05-11 |
| 415 | Mistral Large (Feb '24) | 35.1% | Mistral Large mistralai-mistral-large | Imported | 2026-05-11 |
| 416 | Gemma 3 12B Instruct | 34.9% | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-11 |
| 417 | Mistral Medium | 34.9% | — | Imported | 2026-05-11 |
| 418 | Claude 2.0 | 34.4% | — | Imported | 2026-05-11 |
| 419 | LFM2 8B A1B | 34.4% | — | Imported | 2026-05-11 |
| 420 | LFM2.5-1.2B-Thinking | 33.9% | LFM2.5-1.2B-Thinking liquid-lfm-2.5-1.2b-thinking | Imported | 2026-05-11 |
| 421 | Qwen2.5 Coder Instruct 7B | 33.9% | — | Imported | 2026-05-11 |
| 422 | Granite 3.3 8B (Non-reasoning) | 33.8% | — | Imported | 2026-05-11 |
| 423 | Granite 4.0 Micro | 33.6% | Granite 4.0 Micro ibm-granite-granite-4.0-h-micro | Imported | 2026-05-11 |
| 424 | Jamba Reasoning 3B | 33.3% | — | Imported | 2026-05-11 |
| 425 | Mixtral 8x22B Instruct | 33.2% | Mistral: Mixtral 8x22B Instruct mistralai-mixtral-8x22b-instruct | Imported | 2026-05-11 |
| 426 | DBRX Instruct | 33.1% | — | Imported | 2026-05-11 |
| 427 | Phi-4 Mini Instruct | 33.1% | — | Imported | 2026-05-11 |
| 428 | Claude Instant | 33% | — | Imported | 2026-05-11 |
| 429 | OLMo 2 32B | 32.8% | — | Imported | 2026-05-11 |
| 430 | LFM 40B | 32.7% | — | Imported | 2026-05-11 |
| 431 | Llama 2 Chat 70B | 32.7% | — | Imported | 2026-05-11 |
| 432 | LFM2.5-1.2B-Instruct | 32.6% | LFM2.5-1.2B-Instruct liquid-lfm-2.5-1.2b-instruct | Imported | 2026-05-11 |
| 433 | Gemini 1.5 Flash (May '24) | 32.4% | — | Imported | 2026-05-11 |
| 434 | Command-R+ (Apr '24) | 32.3% | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-11 |
| 435 | Jamba 1.7 Mini | 32.2% | — | Imported | 2026-05-11 |
| 436 | Llama 2 Chat 13B | 32.1% | — | Imported | 2026-05-11 |
| 437 | Claude 2.1 | 31.9% | — | Imported | 2026-05-11 |
| 438 | DeepSeek Coder V2 Lite Instruct | 31.9% | — | Imported | 2026-05-11 |
| 439 | Phi-3 Mini Instruct 3.8B | 31.9% | — | Imported | 2026-05-11 |
| 440 | Phi-4 Multimodal Instruct | 31.5% | — | Imported | 2026-05-11 |
| 441 | Granite 4.1 3B | 31.4% | — | Imported | 2026-05-11 |
| 442 | LFM2 2.6B | 30.6% | — | Imported | 2026-05-11 |
| 443 | MiniCPM-V 4.6 1.3B | 30.5% | — | Imported | 2026-05-11 |
| 444 | Tiny Aya Global | 30.5% | — | Imported | 2026-05-11 |
| 445 | DeepSeek R1 Distill Llama 8B | 30.2% | — | Imported | 2026-05-11 |
| 446 | Jamba 1.5 Mini | 30.2% | — | Imported | 2026-05-11 |
| 447 | Mistral Small (Feb '24) | 30.2% | — | Imported | 2026-05-11 |
| 448 | Jamba 1.6 Mini | 30% | — | Imported | 2026-05-11 |
| 449 | GPT-3.5 Turbo | 29.7% | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-11 |
| 450 | Gemma 3n E4B Instruct | 29.6% | — | Imported | 2026-05-11 |
| 451 | Llama 3 Instruct 8B | 29.6% | — | Imported | 2026-05-11 |
| 452 | Mixtral 8x7B Instruct | 29.2% | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-11 |
| 453 | Gemma 3 4B Instruct | 29.1% | Gemma 3 4B google-gemma-3-4b-it | Imported | 2026-05-11 |
| 454 | LFM2.5-VL-1.6B | 28.9% | — | Imported | 2026-05-11 |
| 455 | Qwen1.5 Chat 110B | 28.9% | — | Imported | 2026-05-11 |
| 456 | OLMo 2 7B | 28.8% | — | Imported | 2026-05-11 |
| 457 | Command-R (Mar '24) | 28.4% | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-11 |
| 458 | Qwen3 1.7B (Non-reasoning) | 28.3% | — | Imported | 2026-05-11 |
| 459 | Granite 4.0 1B | 28.1% | — | Imported | 2026-05-11 |
| 460 | Gemma 3n E4B Instruct Preview (May '25) | 27.8% | — | Imported | 2026-05-11 |
| 461 | Gemini 1.0 Pro | 27.7% | — | Imported | 2026-05-11 |
| 462 | Apertus 70B Instruct | 27.2% | — | Imported | 2026-05-11 |
| 463 | DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | 27% | — | Imported | 2026-05-11 |
| 464 | Granite 4.0 H 1B | 26.3% | — | Imported | 2026-05-11 |
| 465 | Granite 4.0 350M | 26.1% | — | Imported | 2026-05-11 |
| 466 | Llama 3.1 Instruct 8B | 25.9% | — | Imported | 2026-05-11 |
| 467 | Granite 4.0 H 350M | 25.7% | — | Imported | 2026-05-11 |
| 468 | Apertus 8B Instruct | 25.6% | — | Imported | 2026-05-11 |
| 469 | Llama 3.2 Instruct 3B | 25.5% | — | Imported | 2026-05-11 |
| 470 | Molmo 7B-D | 24% | — | Imported | 2026-05-11 |
| 471 | Qwen3 0.6B (Reasoning) | 23.9% | — | Imported | 2026-05-11 |
| 472 | Gemma 3 1B Instruct | 23.7% | — | Imported | 2026-05-11 |
| 473 | Qwen3.5 0.8B (Non-reasoning) | 23.6% | — | Imported | 2026-05-11 |
| 474 | Qwen3 0.6B (Non-reasoning) | 23.1% | — | Imported | 2026-05-11 |
| 475 | OpenChat 3.5 (1210) | 23% | — | Imported | 2026-05-11 |
| 476 | Gemma 3n E2B Instruct | 22.9% | — | Imported | 2026-05-11 |
| 477 | LFM2 1.2B | 22.8% | — | Imported | 2026-05-11 |
| 478 | Llama 2 Chat 7B | 22.7% | — | Imported | 2026-05-11 |
| 479 | Gemma 3 270M | 22.4% | — | Imported | 2026-05-11 |
| 480 | Llama 3.2 Instruct 11B (Vision) | 22.1% | — | Imported | 2026-05-11 |
| 481 | Llama 3.2 Instruct 1B | 19.6% | — | Imported | 2026-05-11 |
| 482 | Mistral 7B Instruct | 17.7% | — | Imported | 2026-05-11 |
| 483 | Qwen3.5 0.8B (Reasoning) | 11.1% | — | Imported | 2026-05-11 |
| 484 | DeepSeek R1 Distill Qwen 1.5B | 9.8% | — | Imported | 2026-05-11 |
| 1 | GPT-5.4 Pro | 94.4% | GPT-5.4 Pro openai-gpt-5.4-pro | Launch post | 2026-04-23 |
| 2 | Gemini 3.1 Pro Preview | 94.3% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-23 |
| 3 | Claude Opus 4.7 | 94.2% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-23 |
| 4 | GPT-5.5 | 93.6% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 5 | GPT-5.4 | 92.8% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
| 1 | Claude Mythos Preview | 94.6% | Claude Mythos Preview anthropic-claude-mythos-preview | Launch post | 2026-04-16 |
| 2 | GPT-5.4 Pro | 94.4% | GPT-5.4 Pro openai-gpt-5.4-pro | Launch post | 2026-04-16 |
| 3 | Gemini 3.1 Pro Preview | 94.3% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Launch post | 2026-04-16 |
| 4 | Claude Opus 4.7 | 94.2% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-16 |
| 5 | Claude Opus 4.6 | 91.3% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Launch post | 2026-04-16 |
No matching rows.