BenchLM

BenchLM is a public aggregate LLM leaderboard that reports overall and category scores for frontier and open-weight models across agentic, coding, reasoning, multimodal-grounded, knowledge, multilingual, instruction-following, and math capabilities.

115rows
overall_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Score, Agentic, Coding, Reasoning, Multimodal Grounded, Knowledge, Multilingual, Instruction Following, Math

Latest Results

Rank Subject Overall Score Model Match Provenance Sampled
1 Claude Mythos Preview 99 Claude Mythos Preview
anthropic-claude-mythos-preview
Imported 2026-05-06
2 Gemini 3.1 Pro 92 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
3 GPT-5.5 91 GPT-5.5
openai-gpt-5.5
Imported 2026-05-06
4 GPT-5.4 Pro 91 GPT-5.4 Pro
openai-gpt-5.4-pro
Imported 2026-05-06
5 Claude Opus 4.7 (Adaptive) 90 Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-06
6 Gemini 3 Pro Deep Think 90 Imported 2026-05-06
7 Grok 4.1 90 Imported 2026-05-06
8 GPT-5.4 89 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
9 DeepSeek V4 Pro (Max) 88 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
10 Claude Opus 4.6 87 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
11 GPT-5.3 Codex 87 GPT-5.3-Codex
openai-gpt-5.3-codex
Imported 2026-05-06
12 Kimi K2.6 85 KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-06
13 DeepSeek V4 Pro (High) 84 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
14 GLM-5.1 83 GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-06
15 Claude Sonnet 4.6 83 Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-06
16 o1-preview 83 o1-preview
openai-o1-preview
Imported 2026-05-06
17 GLM-5 (Reasoning) 82 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
18 Gemini 3 Pro 81 Gemini 3
google-gemini-3
Imported 2026-05-06
19 GPT-5.2 81 GPT-5.2
openai-gpt-5.2
Imported 2026-05-06
20 Qwen3.5 397B (Reasoning) 79 Imported 2026-05-06
21 GPT-5.1 79 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
22 GPT-5 (high) 78 GPT-5
openai-gpt-5
Imported 2026-05-06
23 Claude Opus 4.5 77 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
24 GPT-5.2-Codex 77 GPT-5.2-Codex
openai-gpt-5.2-codex
Imported 2026-05-06
25 Kimi K2.5 (Reasoning) 76 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
26 DeepSeek V4 Flash (Max) 76 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-06
27 GPT-5.1-Codex-Max 76 GPT-5.1-Codex-Max
openai-gpt-5.1-codex-max
Imported 2026-05-06
28 Qwen3.6-27B 74 Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-06
29 Qwen3.6 Plus 73 Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-06
30 GPT-5 (medium) 72 GPT-5
openai-gpt-5
Imported 2026-05-06
31 DeepSeek V4 Flash (High) 71 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-06
32 DeepSeek V4 Pro 70 DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-06
33 Grok 4.1 Fast 70 GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-06
34 GLM-4.7 69 GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-06
35 GLM-5 67 GLM GLM 5
z-ai-glm-5
Imported 2026-05-06
36 Qwen3.6-35B-A3B 67 Qwen3.6 35B A3B
qwen-qwen3.6-35b-a3b
Imported 2026-05-06
37 Claude Sonnet 4.5 66 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
38 Grok 4.20 65 GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-06
39 Qwen3.5-122B-A10B 65 Qwen3.5-122B-A10B
qwen-qwen3.5-122b-a10b
Imported 2026-05-06
40 Gemini 3 Flash 65 Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-06
41 Gemini 2.5 Pro 65 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
42 Grok 4 65 GROK Grok 4
x-ai-grok-4
Imported 2026-05-06
43 Kimi K2.5 64 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
44 Qwen3.5 397B 64 Imported 2026-05-06
45 Qwen3.5-27B 63 Qwen3.5-27B
qwen-qwen3.5-27b
Imported 2026-05-06
46 MiniMax M2.7 62 MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-06
47 DeepSeek V3.2 (Thinking) 62 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
48 MiMo-V2-Flash 61 MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-06
49 DeepSeek V4 Flash 59 DeepSeek V4 Flash
deepseek-deepseek-v4-flash
Imported 2026-05-06
50 DeepSeek V3.2 58 DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-06
51 GPT-4.1 58 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
52 Claude Haiku 4.5 58 Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-06
53 o3 58 o3
openai-o3
Imported 2026-05-06
54 o3-pro 58 o3 Pro
openai-o3-pro
Imported 2026-05-06
55 o1 58 o1
openai-o1
Imported 2026-05-06
56 Qwen3.5-35B-A3B 56 Qwen3.5-35B-A3B
qwen-qwen3.5-35b-a3b
Imported 2026-05-06
57 o3-mini 56 o3-mini
openai-o3-mini
Imported 2026-05-06
58 DeepSeek LLM 2.0 52 Imported 2026-05-06
59 DeepSeek Coder 2.0 52 Imported 2026-05-06
60 Claude 4.1 Opus 52 Imported 2026-05-06
61 Qwen2.5-1M 51 Imported 2026-05-06
62 Claude 4 Sonnet 51 Imported 2026-05-06
63 GPT-4o mini 50 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
64 Qwen2.5-72B 50 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-06
65 DeepSeekMath V2 50 Imported 2026-05-06
66 Mistral Large 3 49 Imported 2026-05-06
67 Gemini 3.1 Flash-Lite 48 Imported 2026-05-06
68 Qwen3 235B 2507 (Reasoning) 47 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
69 Nemotron 3 Ultra 500B 47 Imported 2026-05-06
70 GPT-4.1 mini 46 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
71 Nemotron 3 Super 100B 44 Imported 2026-05-06
72 o4-mini (high) 44 o4 Mini High
openai-o4-mini-high
Imported 2026-05-06
73 Claude 4.1 Opus Thinking 44 Imported 2026-05-06
74 GPT-4o 43 GPT-4o
openai-gpt-4o
Imported 2026-05-06
75 Kimi K2 42 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-06
76 Llama 3.1 405B 41 Imported 2026-05-06
77 Claude 3.5 Sonnet 41 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-06
78 Grok Code Fast 1 40 GROK Grok Code Fast 1
x-ai-grok-code-fast-1
Imported 2026-05-06
79 Sarvam 105B 39 Imported 2026-05-06
80 Gemini 2.5 Flash 38 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
81 Mistral Large 2 38 Imported 2026-05-06
82 DeepSeek V3 36 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-06
83 Gemini 1.5 Pro 36 Imported 2026-05-06
84 GPT-OSS 120B 35 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
85 Claude 3 Opus 35 Imported 2026-05-06
86 DeepSeek-R1 33 R1
deepseek-r1
Imported 2026-05-06
87 Qwen3 235B 2507 33 Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-06
88 DBRX Instruct 33 Imported 2026-05-06
89 Grok 3 [Beta] 32 GROK Grok 3 Beta
x-ai-grok-3-beta
Imported 2026-05-06
90 DeepSeek V3.1 (Reasoning) 30 DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-06
91 o1-pro 29 o1-pro
openai-o1-pro
Imported 2026-05-06
92 Phi-4 28 Phi 4
microsoft-phi-4
Imported 2026-05-06
93 GLM-4.5 27 GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-06
94 Llama 3 70B 27 Imported 2026-05-06
95 GPT-4.1 nano 27 GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-06
96 DeepSeek V3.1 26 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
97 Nemotron 3 Nano 30B 26 Nemotron 3 Nano 30B A3B
nvidia-nemotron-3-nano-30b-a3b
Imported 2026-05-06
98 GPT-4 Turbo 26 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-06
99 Gemini 1.0 Pro 25 Imported 2026-05-06
100 Z-1 24 Imported 2026-05-06
101 Mistral 8x7B 24 Imported 2026-05-06
102 Claude 3 Haiku 24 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-06
103 Mixtral 8x22B Instruct v0.1 23 Imported 2026-05-06
104 Nemotron-4 15B 23 Imported 2026-05-06
105 Moonshot v1 23 Imported 2026-05-06
106 Llama 4 Scout 22 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-06
107 Nemotron Ultra 253B 22 Imported 2026-05-06
108 GLM-4.5-Air 19 GLM GLM 4.5 Air
z-ai-glm-4.5-air
Imported 2026-05-06
109 GPT-OSS 20B 18 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-06
110 Gemma 3 27B 17 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-06
111 Llama 4 Maverick 17 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
112 Llama 4 Behemoth 12 Imported 2026-05-06
113 Nova Pro 10 Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-06
114 Mistral 7B v0.3 5 Imported 2026-05-06
115 Mistral 8x7B v0.2 2 Imported 2026-05-06