ALL Bench LLM
ALL Bench LLM is a composite model leaderboard that aggregates cross-verified LLM scores across reasoning, knowledge, coding, instruction-following, and agentic benchmarks.
42rows
all_bench_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
ALL Bench Score, MMLU-Pro, GPQA, AIME, HLE, ARC-AGI-2, FINAL Bench Metacog, SWE-Pro, BFCL, IFEval, LiveCodeBench, SWE-bench Verified, MMLU Multilingual, Terminal-Bench, SciCode, Elo
| Rank | Subject | ALL Bench Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.6 | 64.87 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 2 | Kimi K2.5 | 60.81 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 3 | GPT-5.2 | 59.48 | GPT-5.2 openai-gpt-5.2 | Imported | 2026-05-06 |
| 4 | Gemini 3.1 Pro | 58.96 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 5 | MiniMax-M2.5 | 50.28 | MiniMax M2.5 minimax-minimax-m2.5 | Imported | 2026-05-06 |
| 6 | Gemini 3 Flash | 50.11 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-06 |
| 7 | GLM-5 | 48.85 | GLM 5 z-ai-glm-5 | Imported | 2026-05-06 |
| 8 | Qwen3.5-397B | 43.01 | — | Imported | 2026-05-06 |
| 9 | Grok 4 Heavy | 40.85 | — | Imported | 2026-05-06 |
| 10 | Qwen3.5-9B | 39.74 | Qwen3.5-9B qwen-qwen3.5-9b | Imported | 2026-05-06 |
| 11 | Qwen3.5-122B | 39.02 | — | Imported | 2026-05-06 |
| 12 | DeepSeek R1 | 36.98 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 13 | Qwen3-Next-80B | 36.72 | — | Imported | 2026-05-06 |
| 14 | GPT-5.3 Codex | 36.24 | GPT-5.3-Codex openai-gpt-5.3-codex | Imported | 2026-05-06 |
| 15 | GPT-OSS-120B | 35.74 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 16 | DeepSeek V3.2 | 35.43 | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-06 |
| 17 | DeepSeek R2 | 35.28 | — | Imported | 2026-05-06 |
| 18 | Qwen3.5-4B | 35.12 | — | Imported | 2026-05-06 |
| 19 | Llama 4 Maverick | 34.56 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 20 | Claude Sonnet 4.6 | 32.28 | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-06 |
| 21 | Solar Open 100B | 30.70 | — | Imported | 2026-05-06 |
| 22 | Claude Sonnet 4.5 | 30.42 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 23 | GPT-5.4 | 27.59 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 24 | Mistral Large 3 | 26.90 | — | Imported | 2026-05-06 |
| 25 | GPT-OSS-20B | 26.25 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 26 | Llama 4 Scout | 26.02 | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-06 |
| 27 | Gemini 3 Pro | 25.55 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 28 | Claude Haiku 4.5 | 24.75 | Claude Haiku 4.5 anthropic-claude-haiku-4.5 | Imported | 2026-05-06 |
| 29 | K-EXAONE | 24.25 | — | Imported | 2026-05-06 |
| 30 | Nanbeige4.1-3B | 22.98 | — | Imported | 2026-05-06 |
| 31 | GPT-5.1 | 22.51 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 32 | Qwen3.5-27B | 20.96 | Qwen3.5-27B qwen-qwen3.5-27b | Imported | 2026-05-06 |
| 33 | Step-3.5-Flash | 18.37 | Step 3.5 Flash stepfun-step-3.5-flash | Imported | 2026-05-06 |
| 34 | Grok 4.1 Fast | 17.33 | Grok 4.1 Fast x-ai-grok-4.1-fast | Imported | 2026-05-06 |
| 35 | Qwen3.5-35B | 11.60 | — | Imported | 2026-05-06 |
| 36 | Phi-4 | 7.30 | Phi 4 microsoft-phi-4 | Imported | 2026-05-06 |
| 37 | A.X K1 | 0 | — | Imported | 2026-05-06 |
| 38 | Gemini 2.5 FL-Lite | 0 | — | Imported | 2026-05-06 |
| 39 | GPT-5-Nano | 0 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 40 | Mi:dm 2.0 Base | 0 | — | Imported | 2026-05-06 |
| 41 | Motif AI | 0 | — | Imported | 2026-05-06 |
| 42 | Qwen3.5-Flash | 0 | Qwen3.5-Flash qwen-qwen3.5-flash-02-23 | Imported | 2026-05-06 |
No matching rows.