MLX Benchmark V2
Benchmark for evaluating LLM proficiency with Apple's MLX machine learning framework across 520 questions, 11 categories, 6 question types, and 4 difficulty levels.
18rows
accuracyprimary metric
2026-05-06sampled
Metadata
Metrics
Accuracy, Correct, Total, qa Accuracy, fill_blank Accuracy, mcq Accuracy, true_false Accuracy, coding Accuracy, debug Accuracy, easy Accuracy, medium Accuracy, hard Accuracy, very-hard Accuracy, mlx_core Accuracy, mlx_nn Accuracy, mlx_optimizers Accuracy, mlx_lm_lora Accuracy, mlx_embeddings_lora Accuracy, mlx_lm Accuracy, mlx_vlm Accuracy, mlx_embeddings Accuracy, coding Accuracy, debugging Accuracy, conceptual Accuracy
| Rank | Subject | Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | openrouter/anthropic/claude-sonnet-4.6 | 89.62 | — | Imported | 2026-05-06 |
| 2 | openrouter/google/gemini-3-flash-preview | 82.39 | — | Imported | 2026-05-06 |
| 3 | openrouter/qwen/qwen3.6-max-preview | 80.13 | — | Imported | 2026-05-06 |
| 4 | openrouter/google/gemma-4-26b-a4b-it | 75.19 | — | Imported | 2026-05-06 |
| 5 | openrouter/openai/gpt-5.4-nano | 75.19 | GPT-5.4 Nano openai-gpt-5.4-nano | Imported | 2026-05-06 |
| 6 | ollama/gemma4:31b-cloud | 74.42 | — | Imported | 2026-05-06 |
| 7 | openrouter/x-ai/grok-4.1-fast | 72.69 | — | Imported | 2026-05-06 |
| 8 | ollama/nemotron-3-super:cloud | 71.15 | — | Imported | 2026-05-06 |
| 9 | openrouter/google/gemini-2.5-flash-lite-preview-09-2025 | 67.31 | — | Imported | 2026-05-06 |
| 10 | openrouter/qwen/qwen3.6-35b-a3b | 52.50 | — | Imported | 2026-05-06 |
| 11 | openrouter/openai/gpt-5-nano | 41.92 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 12 | ollama/deepseek-v4-flash:cloud | 24.81 | — | Imported | 2026-05-06 |
| 13 | ollama/glm-5.1:cloud | 19.23 | — | Imported | 2026-05-06 |
| 14 | ollama/minimax-m2.7:cloud | 10.77 | — | Imported | 2026-05-06 |
| 15 | ollama/kimi-k2.5:cloud | 4.81 | — | Imported | 2026-05-06 |
| 16 | ollama/qwen3.5:cloud | 4.62 | — | Imported | 2026-05-06 |
| 17 | ollama/ministral-3:14b-cloud | 4.42 | — | Imported | 2026-05-06 |
| 18 | ollama/kimi-k2.6:cloud | 3.10 | — | Imported | 2026-05-06 |
No matching rows.