BenchLM
BenchLM is a public aggregate LLM leaderboard that reports overall and category scores for frontier and open-weight models across agentic, coding, reasoning, multimodal-grounded, knowledge, multilingual, instruction-following, and math capabilities.
115rows
overall_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Score, Agentic, Coding, Reasoning, Multimodal Grounded, Knowledge, Multilingual, Instruction Following, Math
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Mythos Preview | 99 | Claude Mythos Preview anthropic-claude-mythos-preview | Imported | 2026-05-06 |
| 2 | Gemini 3.1 Pro | 92 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 3 | GPT-5.5 | 91 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 4 | GPT-5.4 Pro | 91 | GPT-5.4 Pro openai-gpt-5.4-pro | Imported | 2026-05-06 |
| 5 | Claude Opus 4.7 (Adaptive) | 90 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 6 | Gemini 3 Pro Deep Think | 90 | — | Imported | 2026-05-06 |
| 7 | Grok 4.1 | 90 | — | Imported | 2026-05-06 |
| 8 | GPT-5.4 | 89 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 9 | DeepSeek V4 Pro (Max) | 88 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 10 | Claude Opus 4.6 | 87 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 11 | GPT-5.3 Codex | 87 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 12 | Kimi K2.6 | 85 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 13 | DeepSeek V4 Pro (High) | 84 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 14 | GLM-5.1 | 83 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 15 | Claude Sonnet 4.6 | 83 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 16 | o1-preview | 83 | o1-preview openai-o1-preview | Imported | 2026-05-06 |
| 17 | GLM-5 (Reasoning) | 82 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 18 | Gemini 3 Pro | 81 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 19 | GPT-5.2 | 81 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 20 | Qwen3.5 397B (Reasoning) | 79 | — | Imported | 2026-05-06 |
| 21 | GPT-5.1 | 79 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 22 | GPT-5 (high) | 78 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 23 | Claude Opus 4.5 | 77 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 24 | GPT-5.2-Codex | 77 | GPT-5.2-Codex openai-gpt-5.2-codex | Imported | 2026-05-06 |
| 25 | Kimi K2.5 (Reasoning) | 76 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 26 | DeepSeek V4 Flash (Max) | 76 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 27 | GPT-5.1-Codex-Max | 76 | GPT-5.1-Codex-Max openai-gpt-5.1-codex-max | Imported | 2026-05-06 |
| 28 | Qwen3.6-27B | 74 | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-06 |
| 29 | Qwen3.6 Plus | 73 | Qwen3.6 Plus qwen-qwen3.6-plus | Imported | 2026-05-06 |
| 30 | GPT-5 (medium) | 72 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 31 | DeepSeek V4 Flash (High) | 71 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 32 | DeepSeek V4 Pro | 70 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 33 | Grok 4.1 Fast | 70 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-06 |
| 34 | GLM-4.7 | 69 | GLM 4.7 z-ai-glm-4.7 | Imported | 2026-05-06 |
| 35 | GLM-5 | 67 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 36 | Qwen3.6-35B-A3B | 67 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-06 |
| 37 | Claude Sonnet 4.5 | 66 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 38 | Grok 4.20 | 65 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 39 | Qwen3.5-122B-A10B | 65 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Imported | 2026-05-06 |
| 40 | Gemini 3 Flash | 65 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 41 | Gemini 2.5 Pro | 65 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 42 | Grok 4 | 65 | Grok 4 x-ai-grok-4 | Imported | 2026-05-06 |
| 43 | Kimi K2.5 | 64 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 44 | Qwen3.5 397B | 64 | — | Imported | 2026-05-06 |
| 45 | Qwen3.5-27B | 63 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 46 | MiniMax M2.7 | 62 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 47 | DeepSeek V3.2 (Thinking) | 62 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 48 | MiMo-V2-Flash | 61 | MiMo-V2-Flash xiaomi-mimo-v2-flash | Imported | 2026-05-06 |
| 49 | DeepSeek V4 Flash | 59 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 50 | DeepSeek V3.2 | 58 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 51 | GPT-4.1 | 58 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 52 | Claude Haiku 4.5 | 58 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 53 | o3 | 58 | o3 openai-o3 | Imported | 2026-05-06 |
| 54 | o3-pro | 58 | o3 Pro openai-o3-pro | Imported | 2026-05-06 |
| 55 | o1 | 58 | o1 openai-o1 | Imported | 2026-05-06 |
| 56 | Qwen3.5-35B-A3B | 56 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Imported | 2026-05-06 |
| 57 | o3-mini | 56 | o3-mini openai-o3-mini | Imported | 2026-05-06 |
| 58 | DeepSeek LLM 2.0 | 52 | — | Imported | 2026-05-06 |
| 59 | DeepSeek Coder 2.0 | 52 | — | Imported | 2026-05-06 |
| 60 | Claude 4.1 Opus | 52 | — | Imported | 2026-05-06 |
| 61 | Qwen2.5-1M | 51 | — | Imported | 2026-05-06 |
| 62 | Claude 4 Sonnet | 51 | — | Imported | 2026-05-06 |
| 63 | GPT-4o mini | 50 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 64 | Qwen2.5-72B | 50 | Qwen2.5 72B Instruct qwen-qwen-2.5-72b-instruct | Imported | 2026-05-06 |
| 65 | DeepSeekMath V2 | 50 | — | Imported | 2026-05-06 |
| 66 | Mistral Large 3 | 49 | — | Imported | 2026-05-06 |
| 67 | Gemini 3.1 Flash-Lite | 48 | — | Imported | 2026-05-06 |
| 68 | Qwen3 235B 2507 (Reasoning) | 47 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 69 | Nemotron 3 Ultra 500B | 47 | — | Imported | 2026-05-06 |
| 70 | GPT-4.1 mini | 46 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-06 |
| 71 | Nemotron 3 Super 100B | 44 | — | Imported | 2026-05-06 |
| 72 | o4-mini (high) | 44 | o4 Mini High openai-o4-mini-high | Imported | 2026-05-06 |
| 73 | Claude 4.1 Opus Thinking | 44 | — | Imported | 2026-05-06 |
| 74 | GPT-4o | 43 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 75 | Kimi K2 | 42 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-06 |
| 76 | Llama 3.1 405B | 41 | — | Imported | 2026-05-06 |
| 77 | Claude 3.5 Sonnet | 41 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 78 | Grok Code Fast 1 | 40 | Grok Code Fast 1 x-ai-grok-code-fast-1 | Imported | 2026-05-06 |
| 79 | Sarvam 105B | 39 | — | Imported | 2026-05-06 |
| 80 | Gemini 2.5 Flash | 38 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 81 | Mistral Large 2 | 38 | — | Imported | 2026-05-06 |
| 82 | DeepSeek V3 | 36 | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-06 |
| 83 | Gemini 1.5 Pro | 36 | — | Imported | 2026-05-06 |
| 84 | GPT-OSS 120B | 35 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 85 | Claude 3 Opus | 35 | — | Imported | 2026-05-06 |
| 86 | DeepSeek-R1 | 33 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 87 | Qwen3 235B 2507 | 33 | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-06 |
| 88 | DBRX Instruct | 33 | — | Imported | 2026-05-06 |
| 89 | Grok 3 [Beta] | 32 | Grok 3 Beta x-ai-grok-3-beta | Imported | 2026-05-06 |
| 90 | DeepSeek V3.1 (Reasoning) | 30 | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-06 |
| 91 | o1-pro | 29 | o1-pro openai-o1-pro | Imported | 2026-05-06 |
| 92 | Phi-4 | 28 | Phi 4 microsoft-phi-4 | Imported | 2026-05-06 |
| 93 | GLM-4.5 | 27 | GLM 4.5 z-ai-glm-4.5 | Imported | 2026-05-06 |
| 94 | Llama 3 70B | 27 | — | Imported | 2026-05-06 |
| 95 | GPT-4.1 nano | 27 | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-06 |
| 96 | DeepSeek V3.1 | 26 | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-06 |
| 97 | Nemotron 3 Nano 30B | 26 | Nemotron 3 Nano 30B A3B nvidia-nemotron-3-nano-30b-a3b | Imported | 2026-05-06 |
| 98 | GPT-4 Turbo | 26 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-06 |
| 99 | Gemini 1.0 Pro | 25 | — | Imported | 2026-05-06 |
| 100 | Z-1 | 24 | — | Imported | 2026-05-06 |
| 101 | Mistral 8x7B | 24 | — | Imported | 2026-05-06 |
| 102 | Claude 3 Haiku | 24 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-06 |
| 103 | Mixtral 8x22B Instruct v0.1 | 23 | — | Imported | 2026-05-06 |
| 104 | Nemotron-4 15B | 23 | — | Imported | 2026-05-06 |
| 105 | Moonshot v1 | 23 | — | Imported | 2026-05-06 |
| 106 | Llama 4 Scout | 22 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 107 | Nemotron Ultra 253B | 22 | — | Imported | 2026-05-06 |
| 108 | GLM-4.5-Air | 19 | GLM 4.5 Air z-ai-glm-4.5-air | Imported | 2026-05-06 |
| 109 | GPT-OSS 20B | 18 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 110 | Gemma 3 27B | 17 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-06 |
| 111 | Llama 4 Maverick | 17 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 112 | Llama 4 Behemoth | 12 | — | Imported | 2026-05-06 |
| 113 | Nova Pro | 10 | Nova Pro 1.0 amazon-nova-pro-v1 | Imported | 2026-05-06 |
| 114 | Mistral 7B v0.3 | 5 | — | Imported | 2026-05-06 |
| 115 | Mistral 8x7B v0.2 | 2 | — | Imported | 2026-05-06 |
No matching rows.