AIME 2025
All 30 problems from the 2025 American Invitational Mathematics Examination, testing olympiad-level mathematical reasoning.
269rows
scoreprimary metric
2026-05-11sampled
Metadata
Metrics
Accuracy
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.2 (xhigh) | 99% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 2 | GPT-5 Codex (high) | 98.7% | GPT-5 Codex openai-gpt-5-codex | Imported | 2026-05-11 |
| 3 | Gemini 3 Flash Preview (Reasoning) | 97% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-11 |
| 4 | DeepSeek V3.2 Speciale | 96.7% | DeepSeek V3.2 Speciale deepseek-deepseek-v3.2-speciale | Imported | 2026-05-11 |
| 5 | GPT-5.2 (medium) | 96.7% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 6 | MiMo-V2-Flash (Reasoning) | 96.3% | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-11 |
| 7 | Gemini 3 Pro Preview (high) | 95.7% | Gemini 3 google-gemini-3 | Imported | 2026-05-11 |
| 8 | GPT-5.1 Codex (high) | 95.7% | GPT-5.1-Codex openai-gpt-5.1-codex | Imported | 2026-05-11 |
| 9 | GLM-4.7 (Reasoning) | 95% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-11 |
| 10 | KAT-Coder-Pro V1 | 94.7% | — | Imported | 2026-05-11 |
| 11 | Kimi K2 Thinking | 94.7% | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-11 |
| 12 | GPT-5 (high) | 94.3% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 13 | Nova 2.0 Lite (high) | 94.3% | — | Imported | 2026-05-11 |
| 14 | GPT-5.1 (high) | 94% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-11 |
| 15 | gpt-oss-120B (high) | 93.4% | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-11 |
| 16 | Grok 4 | 92.7% | Grok 4 x-ai-grok-4 | Imported | 2026-05-11 |
| 17 | DeepSeek V3.2 (Reasoning) | 92% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 18 | GPT-5 (medium) | 91.7% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 19 | GPT-5.1 Codex mini (high) | 91.7% | GPT-5.1-Codex-Mini openai-gpt-5.1-codex-mini | Imported | 2026-05-11 |
| 20 | Claude Opus 4.5 (Reasoning) | 91.3% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-11 |
| 21 | NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | 91% | Nemotron 3 Nano 30B A3B nvidia-nemotron-3-nano-30b-a3b | Imported | 2026-05-11 |
| 22 | Qwen3 235B A22B 2507 (Reasoning) | 91% | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-11 |
| 23 | GPT-5 mini (high) | 90.7% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 24 | o4-mini (high) | 90.7% | o4 Mini openai-o4-mini | Imported | 2026-05-11 |
| 25 | K-EXAONE (Reasoning) | 90.3% | — | Imported | 2026-05-11 |
| 26 | DeepSeek V3.1 (Reasoning) | 89.7% | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-11 |
| 27 | DeepSeek V3.1 Terminus (Reasoning) | 89.7% | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-11 |
| 28 | Grok 4 Fast (Reasoning) | 89.7% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-11 |
| 29 | Nova 2.0 Omni (medium) | 89.7% | — | Imported | 2026-05-11 |
| 30 | gpt-oss-20B (high) | 89.3% | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-11 |
| 31 | Grok 4.1 Fast (Reasoning) | 89.3% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-11 |
| 32 | Ring-1T | 89.3% | — | Imported | 2026-05-11 |
| 33 | Nova 2.0 Pro Preview (medium) | 89% | — | Imported | 2026-05-11 |
| 34 | Nova 2.0 Lite (medium) | 88.7% | — | Imported | 2026-05-11 |
| 35 | o3 | 88.3% | o3 openai-o3 | Imported | 2026-05-11 |
| 36 | Qwen3 VL 235B A22B (Reasoning) | 88.3% | — | Imported | 2026-05-11 |
| 37 | Apriel-v1.6-15B-Thinker | 88% | — | Imported | 2026-05-11 |
| 38 | Claude 4.5 Sonnet (Reasoning) | 88% | — | Imported | 2026-05-11 |
| 39 | INTELLECT-3 | 88% | INTELLECT-3 prime-intellect-intellect-3 | Imported | 2026-05-11 |
| 40 | DeepSeek V3.2 Exp (Reasoning) | 87.7% | DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp | Imported | 2026-05-11 |
| 41 | Gemini 2.5 Pro | 87.7% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-11 |
| 42 | Apriel-v1.5-15B-Thinker | 87.5% | — | Imported | 2026-05-11 |
| 43 | Gemini 3 Pro Preview (low) | 86.7% | Gemini 3 google-gemini-3 | Imported | 2026-05-11 |
| 44 | GLM-4.6 (Reasoning) | 86% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-11 |
| 45 | GLM-4.6V (Reasoning) | 85.3% | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-11 |
| 46 | ERNIE 5.0 Thinking Preview | 85% | — | Imported | 2026-05-11 |
| 47 | GPT-5 mini (medium) | 85% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 48 | Grok 3 mini Reasoning (high) | 84.7% | — | Imported | 2026-05-11 |
| 49 | Qwen3 VL 32B (Reasoning) | 84.7% | — | Imported | 2026-05-11 |
| 50 | Seed-OSS-36B-Instruct | 84.7% | — | Imported | 2026-05-11 |
| 51 | Qwen3 Next 80B A3B (Reasoning) | 84.3% | — | Imported | 2026-05-11 |
| 52 | Claude 4.5 Haiku (Reasoning) | 83.7% | — | Imported | 2026-05-11 |
| 53 | GPT-5 nano (high) | 83.7% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 54 | Ring-flash-2.0 | 83.7% | — | Imported | 2026-05-11 |
| 55 | GPT-5 (low) | 83% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 56 | MiniMax-M2.1 | 82.7% | MiniMax M2.1 minimax-minimax-m2.1 | Imported | 2026-05-11 |
| 57 | Qwen3 4B 2507 (Reasoning) | 82.7% | — | Imported | 2026-05-11 |
| 58 | Qwen3 Max Thinking (Preview) | 82.3% | Qwen3 Max Thinking qwen-qwen3-max-thinking | Imported | 2026-05-11 |
| 59 | Qwen3 VL 30B A3B (Reasoning) | 82.3% | — | Imported | 2026-05-11 |
| 60 | Magistral Medium 1.2 | 82% | — | Imported | 2026-05-11 |
| 61 | Qwen3 235B A22B (Reasoning) | 82% | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-11 |
| 62 | GLM-4.5-Air | 80.7% | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-11 |
| 63 | Qwen3 Max | 80.7% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-11 |
| 64 | Claude 4.1 Opus (Reasoning) | 80.3% | — | Imported | 2026-05-11 |
| 65 | Magistral Small 1.2 | 80.3% | — | Imported | 2026-05-11 |
| 66 | Motif-2-12.7B-Reasoning | 80.3% | — | Imported | 2026-05-11 |
| 67 | EXAONE 4.0 32B (Reasoning) | 80% | — | Imported | 2026-05-11 |
| 68 | Falcon-H1R-7B | 80% | — | Imported | 2026-05-11 |
| 69 | Doubao Seed Code | 79.3% | — | Imported | 2026-05-11 |
| 70 | Mi:dm K 2.5 Pro Preview | 78.7% | — | Imported | 2026-05-11 |
| 71 | Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | 78.3% | — | Imported | 2026-05-11 |
| 72 | GPT-5 nano (medium) | 78.3% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 73 | K2-V2 (high) | 78.3% | — | Imported | 2026-05-11 |
| 74 | MiniMax-M2 | 78.3% | MiniMax M2 minimax-minimax-m2 | Imported | 2026-05-11 |
| 75 | Olmo 3.1 32B Think | 77.3% | — | Imported | 2026-05-11 |
| 76 | Llama Nemotron Super 49B v1.5 (Reasoning) | 76.7% | — | Imported | 2026-05-11 |
| 77 | Mi:dm K 2.5 Pro | 76.7% | — | Imported | 2026-05-11 |
| 78 | DeepSeek R1 0528 (May '25) | 76% | R1 deepseek-r1 | Imported | 2026-05-11 |
| 79 | NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | 75% | Nemotron Nano 12B 2 VL nvidia-nemotron-nano-12b-v2-vl | Imported | 2026-05-11 |
| 80 | Qwen3 Max (Preview) | 75% | Qwen3 Max qwen-qwen3-max | Imported | 2026-05-11 |
| 81 | Claude 4 Sonnet (Reasoning) | 74.3% | — | Imported | 2026-05-11 |
| 82 | Qwen3 Omni 30B A3B (Reasoning) | 74% | — | Imported | 2026-05-11 |
| 83 | GLM-4.5 (Reasoning) | 73.7% | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-11 |
| 84 | Olmo 3 32B Think | 73.7% | Olmo 3 32B Think allenai-olmo-3-32b-think | Imported | 2026-05-11 |
| 85 | Claude 4 Opus (Reasoning) | 73.3% | — | Imported | 2026-05-11 |
| 86 | Gemini 2.5 Flash (Reasoning) | 73.3% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-11 |
| 87 | GLM-4.5V (Reasoning) | 73% | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-11 |
| 88 | Qwen3 32B (Reasoning) | 73% | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-11 |
| 89 | Cogito v2.1 (Reasoning) | 72.7% | — | Imported | 2026-05-11 |
| 90 | Qwen3 30B A3B (Reasoning) | 72.3% | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-11 |
| 91 | Qwen3 VL 30B A3B Instruct | 72.3% | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Imported | 2026-05-11 |
| 92 | Qwen3 235B A22B 2507 Instruct | 71.7% | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-11 |
| 93 | Ling-1T | 71.3% | — | Imported | 2026-05-11 |
| 94 | Olmo 3 7B Think | 70.7% | — | Imported | 2026-05-11 |
| 95 | Qwen3 VL 235B A22B Instruct | 70.7% | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Imported | 2026-05-11 |
| 96 | Hermes 4 - Llama-3.1 405B (Reasoning) | 69.7% | — | Imported | 2026-05-11 |
| 97 | NVIDIA Nemotron Nano 9B V2 (Reasoning) | 69.7% | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Imported | 2026-05-11 |
| 98 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | 68.7% | — | Imported | 2026-05-11 |
| 99 | Hermes 4 - Llama-3.1 70B (Reasoning) | 68.7% | — | Imported | 2026-05-11 |
| 100 | Qwen3 VL 32B Instruct | 68.3% | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Imported | 2026-05-11 |
| 101 | DeepSeek R1 (Jan '25) | 68% | R1 deepseek-r1 | Imported | 2026-05-11 |
| 102 | MiMo-V2-Flash (Non-reasoning) | 67.7% | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-11 |
| 103 | gpt-oss-120B (low) | 66.7% | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-11 |
| 104 | Qwen3 30B A3B 2507 Instruct | 66.3% | — | Imported | 2026-05-11 |
| 105 | Qwen3 Next 80B A3B Instruct | 66.3% | Qwen3 Next 80B A3B Instruct qwen-qwen3-next-80b-a3b-instruct | Imported | 2026-05-11 |
| 106 | Ling-flash-2.0 | 65.3% | — | Imported | 2026-05-11 |
| 107 | K2-V2 (medium) | 64.7% | — | Imported | 2026-05-11 |
| 108 | DeepSeek R1 0528 Qwen3 8B | 63.7% | — | Imported | 2026-05-11 |
| 109 | Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | 63.7% | — | Imported | 2026-05-11 |
| 110 | Nova 2.0 Pro Preview (low) | 63.3% | — | Imported | 2026-05-11 |
| 111 | DeepSeek R1 Distill Qwen 32B | 63% | R1 Distill Qwen 32B deepseek-deepseek-r1-distill-qwen-32b | Imported | 2026-05-11 |
| 112 | Claude Opus 4.5 (Non-reasoning) | 62.7% | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-11 |
| 113 | gpt-oss-20B (low) | 62.3% | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-11 |
| 114 | NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | 62.3% | Nemotron Nano 9B V2 nvidia-nemotron-nano-9b-v2 | Imported | 2026-05-11 |
| 115 | Solar Pro 2 (Reasoning) | 61.3% | — | Imported | 2026-05-11 |
| 116 | MiniMax M1 80k | 61% | — | Imported | 2026-05-11 |
| 117 | Gemini 2.5 Flash (Non-reasoning) | 60.3% | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-11 |
| 118 | DeepSeek V3.2 (Non-reasoning) | 59% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 119 | HyperCLOVA X SEED Think (32B) | 59% | — | Imported | 2026-05-11 |
| 120 | Grok 3 | 58% | Grok 3 xaigrok-3 | Imported | 2026-05-11 |
| 121 | Qwen3 14B (Non-reasoning) | 58% | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-11 |
| 122 | DeepSeek V3.2 Exp (Non-reasoning) | 57.7% | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-11 |
| 123 | Kimi K2 0905 | 57.3% | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-11 |
| 124 | Kimi K2 | 57% | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-11 |
| 125 | Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | 56.7% | — | Imported | 2026-05-11 |
| 126 | Claude 3.7 Sonnet (Reasoning) | 56.3% | Claude 3.7 Sonnet (thinking) anthropic-claude-3.7-sonnet-thinking | Imported | 2026-05-11 |
| 127 | Qwen3 30B A3B 2507 (Reasoning) | 56.3% | — | Imported | 2026-05-11 |
| 128 | Nova 2.0 Omni (low) | 56% | — | Imported | 2026-05-11 |
| 129 | DeepSeek R1 Distill Qwen 14B | 55.7% | — | Imported | 2026-05-11 |
| 130 | Gemini 3 Flash Preview (Non-reasoning) | 55.7% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-11 |
| 131 | Qwen3 14B (Reasoning) | 55.7% | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-11 |
| 132 | Llama 3.3 Nemotron Super 49B v1 (Reasoning) | 54.7% | — | Imported | 2026-05-11 |
| 133 | DeepSeek R1 Distill Llama 70B | 53.7% | R1 Distill Llama 70B deepseek-deepseek-r1-distill-llama-70b | Imported | 2026-05-11 |
| 134 | DeepSeek V3.1 Terminus (Non-reasoning) | 53.7% | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-11 |
| 135 | Gemini 2.5 Flash-Lite (Reasoning) | 53.3% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-11 |
| 136 | Qwen3 4B 2507 Instruct | 52.3% | — | Imported | 2026-05-11 |
| 137 | Qwen3 Omni 30B A3B Instruct | 52.3% | — | Imported | 2026-05-11 |
| 138 | GPT-5.2 (Non-reasoning) | 51% | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-11 |
| 139 | Exaone 4.0 1.2B (Reasoning) | 50.3% | — | Imported | 2026-05-11 |
| 140 | Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | 50% | — | Imported | 2026-05-11 |
| 141 | DeepSeek V3.1 (Non-reasoning) | 49.7% | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-11 |
| 142 | Ling-mini-2.0 | 49.3% | — | Imported | 2026-05-11 |
| 143 | GPT-5 (ChatGPT) | 48.3% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 144 | GLM-4.7 (Non-reasoning) | 48% | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-11 |
| 145 | Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | 46.7% | Gemini 2.5 Flash Lite Preview 09-2025 google-gemini-2.5-flash-lite-preview-09-2025 | Imported | 2026-05-11 |
| 146 | GPT-5 mini (minimal) | 46.7% | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-11 |
| 147 | Nova 2.0 Lite (low) | 46.7% | — | Imported | 2026-05-11 |
| 148 | GPT-4.1 mini | 46.3% | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-11 |
| 149 | GLM-4.6 (Non-reasoning) | 44.3% | GLM 4.6 z-ai-glm-4.6 | Imported | 2026-05-11 |
| 150 | K-EXAONE (Non-reasoning) | 44% | — | Imported | 2026-05-11 |
| 151 | Grok Code Fast 1 | 43.3% | Grok Code Fast 1 x-ai-grok-code-fast-1 | Imported | 2026-05-11 |
| 152 | DeepSeek R1 Distill Llama 8B | 41.3% | — | Imported | 2026-05-11 |
| 153 | ERNIE 4.5 300B A47B | 41.3% | ERNIE 4.5 300B A47B baidu-ernie-4.5-300b-a47b | Imported | 2026-05-11 |
| 154 | Grok 4 Fast (Non-reasoning) | 41.3% | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-11 |
| 155 | Magistral Small 1 | 41.3% | — | Imported | 2026-05-11 |
| 156 | Olmo 3 7B Instruct | 41.3% | — | Imported | 2026-05-11 |
| 157 | DeepSeek V3 0324 | 41% | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-11 |
| 158 | Magistral Medium 1 | 40.3% | — | Imported | 2026-05-11 |
| 159 | EXAONE 4.0 32B (Non-reasoning) | 39.3% | — | Imported | 2026-05-11 |
| 160 | Qwen3 Coder 480B A35B Instruct | 39.3% | Qwen3 Coder 480B A35B qwen-qwen3-coder | Imported | 2026-05-11 |
| 161 | Claude 4.5 Haiku (Non-reasoning) | 39% | — | Imported | 2026-05-11 |
| 162 | Qwen3 1.7B (Reasoning) | 38.7% | — | Imported | 2026-05-11 |
| 163 | Mistral Medium 3.1 | 38.3% | Mistral: Mistral Medium 3.1 mistralai-mistral-medium-3.1 | Imported | 2026-05-11 |
| 164 | Claude 4 Sonnet (Non-reasoning) | 38% | — | Imported | 2026-05-11 |
| 165 | GPT-5.1 (Non-reasoning) | 38% | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-11 |
| 166 | Mistral Large 3 | 38% | — | Imported | 2026-05-11 |
| 167 | Claude 4.5 Sonnet (Non-reasoning) | 37% | — | Imported | 2026-05-11 |
| 168 | Nova 2.0 Omni (Non-reasoning) | 37% | — | Imported | 2026-05-11 |
| 169 | Qwen3 VL 4B Instruct | 37% | — | Imported | 2026-05-11 |
| 170 | Devstral 2 | 36.7% | — | Imported | 2026-05-11 |
| 171 | Claude 4 Opus (Non-reasoning) | 36.3% | — | Imported | 2026-05-11 |
| 172 | Kimi Linear 48B A3B Instruct | 36.3% | — | Imported | 2026-05-11 |
| 173 | Gemini 2.5 Flash-Lite (Non-reasoning) | 35.3% | Gemini 2.5 Flash Lite google-gemini-2.5-flash-lite | Imported | 2026-05-11 |
| 174 | K2-V2 (low) | 35.3% | — | Imported | 2026-05-11 |
| 175 | GPT-4.1 | 34.7% | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-11 |
| 176 | Devstral Small 2 | 34.3% | — | Imported | 2026-05-11 |
| 177 | Grok 4.1 Fast (Non-reasoning) | 34.3% | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-11 |
| 178 | Nova 2.0 Lite (Non-reasoning) | 33.7% | — | Imported | 2026-05-11 |
| 179 | Reka Flash 3 | 33.7% | Reka Flash 3 rekaai-reka-flash-3 | Imported | 2026-05-11 |
| 180 | GPT-5 (minimal) | 31.7% | GPT-5 openai-gpt-5 | Imported | 2026-05-11 |
| 181 | Ministral 3 8B | 31.7% | — | Imported | 2026-05-11 |
| 182 | Nova 2.0 Pro Preview (Non-reasoning) | 30.7% | — | Imported | 2026-05-11 |
| 183 | Qwen3 VL 8B (Reasoning) | 30.7% | — | Imported | 2026-05-11 |
| 184 | Mistral Medium 3 | 30.3% | Mistral: Mistral Medium 3 mistralai-mistral-medium-3 | Imported | 2026-05-11 |
| 185 | Ministral 3 14B | 30% | — | Imported | 2026-05-11 |
| 186 | Solar Pro 2 (Non-reasoning) | 30% | — | Imported | 2026-05-11 |
| 187 | Devstral Small (Jul '25) | 29.3% | Mistral: Devstral Small 1.1 mistralai-devstral-small | Imported | 2026-05-11 |
| 188 | Qwen3 Coder 30B A3B Instruct | 29% | Qwen3 Coder 30B A3B Instruct qwen-qwen3-coder-30b-a3b-instruct | Imported | 2026-05-11 |
| 189 | QwQ 32B | 29% | — | Imported | 2026-05-11 |
| 190 | GPT-5 nano (minimal) | 27.3% | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-11 |
| 191 | Qwen3 VL 8B Instruct | 27.3% | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Imported | 2026-05-11 |
| 192 | Mistral Small 3.2 | 27% | — | Imported | 2026-05-11 |
| 193 | NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | 26.7% | Nemotron Nano 12B 2 VL nvidia-nemotron-nano-12b-v2-vl | Imported | 2026-05-11 |
| 194 | GLM-4.6V (Non-reasoning) | 26.3% | GLM 4.6V z-ai-glm-4.6v | Imported | 2026-05-11 |
| 195 | DeepSeek V3 (Dec '24) | 26% | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-11 |
| 196 | GPT-4o (March 2025, chatgpt-4o-latest) | 25.7% | GPT-4o openai-gpt-4o | Imported | 2026-05-11 |
| 197 | Qwen3 VL 4B (Reasoning) | 25.7% | — | Imported | 2026-05-11 |
| 198 | LFM2 8B A1B | 25.3% | — | Imported | 2026-05-11 |
| 199 | Qwen3 8B (Non-reasoning) | 24.3% | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-11 |
| 200 | Exaone 4.0 1.2B (Non-reasoning) | 24% | — | Imported | 2026-05-11 |
| 201 | GPT-4.1 nano | 24% | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-11 |
| 202 | Qwen3 235B A22B (Non-reasoning) | 23.7% | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-11 |
| 203 | Qwen3 4B (Reasoning) | 22.3% | — | Imported | 2026-05-11 |
| 204 | DeepSeek R1 Distill Qwen 1.5B | 22% | — | Imported | 2026-05-11 |
| 205 | Ministral 3 3B | 22% | — | Imported | 2026-05-11 |
| 206 | Gemini 2.0 Flash (Feb '25) | 21.7% | Gemini 2.0 Flash google-gemini-2.0-flash | Imported | 2026-05-11 |
| 207 | Qwen3 30B A3B (Non-reasoning) | 21.7% | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-11 |
| 208 | Claude 3.7 Sonnet (Non-reasoning) | 21% | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-11 |
| 209 | Gemma 3 27B Instruct | 20.7% | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-11 |
| 210 | Qwen3 32B (Non-reasoning) | 19.7% | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-11 |
| 211 | Llama 4 Maverick | 19.3% | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-11 |
| 212 | Qwen3 8B (Reasoning) | 19% | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-11 |
| 213 | Gemma 3 12B Instruct | 18.3% | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-11 |
| 214 | Phi-4 | 18% | Phi 4 microsoft-phi-4 | Imported | 2026-05-11 |
| 215 | Qwen3 0.6B (Reasoning) | 18% | — | Imported | 2026-05-11 |
| 216 | Nova Premier | 17.3% | — | Imported | 2026-05-11 |
| 217 | GLM-4.5V (Non-reasoning) | 15.3% | GLM 4.5V z-ai-glm-4.5v | Imported | 2026-05-11 |
| 218 | Hermes 4 - Llama-3.1 405B (Non-reasoning) | 15.3% | — | Imported | 2026-05-11 |
| 219 | GPT-4o mini | 14.7% | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-11 |
| 220 | Gemma 3n E4B Instruct | 14.3% | — | Imported | 2026-05-11 |
| 221 | Llama 4 Scout | 14% | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-11 |
| 222 | Mistral Large 2 (Nov '24) | 14% | — | Imported | 2026-05-11 |
| 223 | Qwen2.5 Instruct 72B | 14% | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-11 |
| 224 | Granite 4.0 H Small | 13.7% | — | Imported | 2026-05-11 |
| 225 | MiniMax M1 40k | 13.7% | — | Imported | 2026-05-11 |
| 226 | NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | 13.3% | Nemotron 3 Nano 30B A3B nvidia-nemotron-3-nano-30b-a3b | Imported | 2026-05-11 |
| 227 | Command A | 13% | Command A cohere-command-a | Imported | 2026-05-11 |
| 228 | Gemma 3 4B Instruct | 12.7% | Gemma 3 4B google-gemma-3-4b-it | Imported | 2026-05-11 |
| 229 | Hermes 4 - Llama-3.1 70B (Non-reasoning) | 11.3% | — | Imported | 2026-05-11 |
| 230 | Llama 3.1 Nemotron Instruct 70B | 11% | — | Imported | 2026-05-11 |
| 231 | Jamba Reasoning 3B | 10.7% | — | Imported | 2026-05-11 |
| 232 | Gemma 3n E2B Instruct | 10.3% | — | Imported | 2026-05-11 |
| 233 | Qwen3 0.6B (Non-reasoning) | 10.3% | — | Imported | 2026-05-11 |
| 234 | LFM2 2.6B | 8.3% | — | Imported | 2026-05-11 |
| 235 | Llama Nemotron Super 49B v1.5 (Non-reasoning) | 8% | — | Imported | 2026-05-11 |
| 236 | Llama 3.3 Instruct 70B | 7.7% | — | Imported | 2026-05-11 |
| 237 | Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | 7.7% | — | Imported | 2026-05-11 |
| 238 | Qwen3 1.7B (Non-reasoning) | 7.3% | — | Imported | 2026-05-11 |
| 239 | Nova Lite | 7% | Nova Lite 1.0 amazon-nova-lite-v1 | Imported | 2026-05-11 |
| 240 | Nova Pro | 7% | Nova Pro 1.0 amazon-nova-pro-v1 | Imported | 2026-05-11 |
| 241 | Granite 3.3 8B (Non-reasoning) | 6.7% | — | Imported | 2026-05-11 |
| 242 | Phi-4 Mini Instruct | 6.7% | — | Imported | 2026-05-11 |
| 243 | Granite 4.0 1B | 6.3% | — | Imported | 2026-05-11 |
| 244 | Granite 4.0 H 1B | 6.3% | — | Imported | 2026-05-11 |
| 245 | GPT-4o (Nov '24) | 6% | GPT-4o openai-gpt-4o | Imported | 2026-05-11 |
| 246 | Granite 4.0 Micro | 6% | Granite 4.0 Micro ibm-granite-granite-4.0-h-micro | Imported | 2026-05-11 |
| 247 | Nova Micro | 6% | Nova Micro 1.0 amazon-nova-micro-v1 | Imported | 2026-05-11 |
| 248 | Devstral Medium | 4.7% | Mistral: Devstral Medium mistralai-devstral-medium | Imported | 2026-05-11 |
| 249 | Llama 3.1 Instruct 8B | 4.3% | — | Imported | 2026-05-11 |
| 250 | Mistral Small 3 | 4.3% | — | Imported | 2026-05-11 |
| 251 | Llama 3.1 Instruct 70B | 4% | — | Imported | 2026-05-11 |
| 252 | Mistral Small 3.1 | 3.7% | — | Imported | 2026-05-11 |
| 253 | Gemma 3 1B Instruct | 3.3% | — | Imported | 2026-05-11 |
| 254 | LFM2 1.2B | 3.3% | — | Imported | 2026-05-11 |
| 255 | Llama 3.2 Instruct 3B | 3.3% | — | Imported | 2026-05-11 |
| 256 | OLMo 2 32B | 3.3% | — | Imported | 2026-05-11 |
| 257 | Llama 3.1 Instruct 405B | 3% | — | Imported | 2026-05-11 |
| 258 | Gemma 3 270M | 2.3% | — | Imported | 2026-05-11 |
| 259 | Jamba 1.7 Large | 2.3% | — | Imported | 2026-05-11 |
| 260 | Pixtral Large | 2.3% | Mistral: Pixtral Large 2411 mistralai-pixtral-large-2411 | Imported | 2026-05-11 |
| 261 | Llama 3.2 Instruct 11B (Vision) | 1.7% | — | Imported | 2026-05-11 |
| 262 | Granite 4.0 H 350M | 1.3% | — | Imported | 2026-05-11 |
| 263 | OLMo 2 7B | 0.7% | — | Imported | 2026-05-11 |
| 264 | Jamba 1.7 Mini | 0.3% | — | Imported | 2026-05-11 |
| 265 | Phi-3 Mini Instruct 3.8B | 0.3% | — | Imported | 2026-05-11 |
| 266 | Granite 4.0 350M | 0% | — | Imported | 2026-05-11 |
| 267 | Llama 3.2 Instruct 1B | 0% | — | Imported | 2026-05-11 |
| 268 | Mistral Large 2 (Jul '24) | 0% | Mistral Large 2407 mistralai-mistral-large-2407 | Imported | 2026-05-11 |
| 269 | Molmo 7B-D | 0% | — | Imported | 2026-05-11 |
No matching rows.