MLPerf Endpoints

MLCommons benchmark for API-hosted GenAI endpoints, measuring serving latency, throughput, concurrency, and endpoint efficiency.

10rows
tokens_per_secondprimary metric
2026-05-27sampled

Metadata

Metrics

System Tokens/Second, Tokens/Second per User, QPS, TTFT P99 (lower is better), TTFT Average (lower is better), TPOT Average (lower is better), Request Latency Average (lower is better), Concurrency, Utilization

Latest Results

Rows are parsed from the public MLPerf Endpoints server-rendered report index and detail pages. Each row uses the run with the highest System Tokens/Second for that submitted system.

Rank Subject System Tokens/Second Model Match Provenance Sampled
1 System 9 / Deepseek-R1 3908.3 tokens/s Imported 2026-05-27
2 System 1 / GPT-OSS 120B 421 tokens/s Imported 2026-05-27
3 System 2 / GPT-OSS 120B 376.2 tokens/s Imported 2026-05-27
4 System 7 / Llama-3.1-70B 241.9 tokens/s Imported 2026-05-27
5 System 8 / Llama-3.1-70B 215.9 tokens/s Imported 2026-05-27
6 System 4 / QWEN3 CODER 480B 199.8 tokens/s Imported 2026-05-27
7 System 3 / QWEN3 CODER 480B 192.4 tokens/s Imported 2026-05-27
8 System 10 / Deepseek-R1 68.2 tokens/s Imported 2026-05-27
9 System 5 / Llama-3.1-8B 67.2 tokens/s Imported 2026-05-27
10 System 6 / Llama-3.1-8B 34.3 tokens/s Imported 2026-05-27