MLX Benchmark V2 | BenchmarkList

Metadata

ID: mlx_benchmark_v2
Category: Coding
Release: 2026-04-18
Source: Source page
Snapshot: Snapshot source

Metrics

Accuracy, Correct, Total, qa Accuracy, fill_blank Accuracy, mcq Accuracy, true_false Accuracy, coding Accuracy, debug Accuracy, easy Accuracy, medium Accuracy, hard Accuracy, very-hard Accuracy, mlx_core Accuracy, mlx_nn Accuracy, mlx_optimizers Accuracy, mlx_lm_lora Accuracy, mlx_embeddings_lora Accuracy, mlx_lm Accuracy, mlx_vlm Accuracy, mlx_embeddings Accuracy, coding Accuracy, debugging Accuracy, conceptual Accuracy

Rank	Subject	Accuracy	Model Match	Provenance	Sampled
1	openrouter/anthropic/claude-sonnet-4.6	89.62	—	Imported	2026-05-06
2	openrouter/google/gemini-3-flash-preview	82.39	—	Imported	2026-05-06
3	openrouter/qwen/qwen3.6-max-preview	80.13	—	Imported	2026-05-06
4	openrouter/google/gemma-4-26b-a4b-it	75.19	—	Imported	2026-05-06
5	openrouter/openai/gpt-5.4-nano	75.19	GPT-5.4 Nano openai-gpt-5.4-nano	Imported	2026-05-06
6	ollama/gemma4:31b-cloud	74.42	—	Imported	2026-05-06
7	openrouter/x-ai/grok-4.1-fast	72.69	—	Imported	2026-05-06
8	ollama/nemotron-3-super:cloud	71.15	—	Imported	2026-05-06
9	openrouter/google/gemini-2.5-flash-lite-preview-09-2025	67.31	—	Imported	2026-05-06
10	openrouter/qwen/qwen3.6-35b-a3b	52.50	—	Imported	2026-05-06
11	openrouter/openai/gpt-5-nano	41.92	GPT-5 Nano openai-gpt-5-nano	Imported	2026-05-06
12	ollama/deepseek-v4-flash:cloud	24.81	—	Imported	2026-05-06
13	ollama/glm-5.1:cloud	19.23	—	Imported	2026-05-06
14	ollama/minimax-m2.7:cloud	10.77	—	Imported	2026-05-06
15	ollama/kimi-k2.5:cloud	4.81	—	Imported	2026-05-06
16	ollama/qwen3.5:cloud	4.62	—	Imported	2026-05-06
17	ollama/ministral-3:14b-cloud	4.42	—	Imported	2026-05-06
18	ollama/kimi-k2.6:cloud	3.10	—	Imported	2026-05-06

Metadata

Metrics

Latest Results