BLXBench
Community benchmark runner and public leaderboard for AI model performance across coding, debugging, reasoning, hallucination, refactoring, security, and speed slices.
25rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Passed Tests, Total Tests, Pass Rate, Average Latency (lower is better), Decode Throughput, Slice Cost (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Grok 4.3 | 85.50 | Grok 4.3 x-ai-grok-4.3 | Imported | 2026-05-06 |
| 2 | Claude Opus 4.7 | 84.80 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-06 |
| 3 | Gpt Chat Latest | 83.80 | — | Imported | 2026-05-06 |
| 4 | Owl Alpha | 83.60 | Owl Alpha openrouter-owl-alpha | Imported | 2026-05-06 |
| 5 | Qwen3.6 Flash | 82.80 | Qwen3.6 Flash qwen-qwen3.6-flash | Imported | 2026-05-06 |
| 6 | Grok 4.20 | 79.10 | Grok 4.20 x-ai-grok-4.20 | Imported | 2026-05-06 |
| 7 | Gpt Mini Latest | 78.30 | OpenAI GPT Mini Latest openai-gpt-mini-latest | Imported | 2026-05-06 |
| 8 | Ling 2.6 1t | 75.30 | Ling-2.6-1T inclusionai-ling-2.6-1t | Imported | 2026-05-06 |
| 9 | Mistral Small 2603 | 75.20 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Imported | 2026-05-06 |
| 10 | Claude Opus 4.6 | 71.10 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 11 | Gpt 5.5 | 65.90 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-06 |
| 12 | Granite 4.1 8b | 64.60 | Granite 4.1 8B ibm-granite-granite-4.1-8b | Imported | 2026-05-06 |
| 13 | Deepseek V4 Flash | 48.30 | DeepSeek V4 Flash deepseek-deepseek-v4-flash | Imported | 2026-05-06 |
| 14 | Gpt 5.5 Pro | 44.50 | GPT-5.5 Pro openai-gpt-5.5-pro | Imported | 2026-05-06 |
| 15 | Mimo V2.5 Pro | 44 | MiMo-V2.5-Pro xiaomi-mimo-v2.5-pro | Imported | 2026-05-06 |
| 16 | Qwen3.6 35b A3b | 33.90 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Imported | 2026-05-06 |
| 17 | Nemotron 3 Nano Omni 30b A3b Reasoning | 29.90 | Nemotron 3 Nano Omni nvidia-nemotron-3-nano-omni-30b-a3b-reasoning | Imported | 2026-05-06 |
| 18 | Qwen3.6 27b | 29 | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-06 |
| 19 | Mimo V2.5 | 15.70 | MiMo-V2.5 xiaomi-mimo-v2.5 | Imported | 2026-05-06 |
| 20 | Kimi K2.6 | 15.40 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-06 |
| 21 | Deepseek V4 Pro | 15.20 | DeepSeek V4 Pro deepseek-deepseek-v4-pro | Imported | 2026-05-06 |
| 22 | Glm 5.1 | 13.90 | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-06 |
| 23 | Minimax M2.7 | 4.60 | MiniMax M2.7 minimax-minimax-m2.7 | Imported | 2026-05-06 |
| 24 | Gemini 3.1 Pro Preview | 3.70 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 25 | Gemini Pro Latest | 2.90 | Google Gemini Pro Latest google-gemini-pro-latest | Imported | 2026-05-06 |
No matching rows.