MLPerf Endpoints
MLCommons benchmark for API-hosted GenAI endpoints, measuring serving latency, throughput, concurrency, and endpoint efficiency.
10rows
tokens_per_secondprimary metric
2026-05-27sampled
Metadata
Metrics
System Tokens/Second, Tokens/Second per User, QPS, TTFT P99 (lower is better), TTFT Average (lower is better), TPOT Average (lower is better), Request Latency Average (lower is better), Concurrency, Utilization
| Rank | Subject | System Tokens/Second | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | System 9 / Deepseek-R1 | 3908.3 tokens/s | — | Imported | 2026-05-27 |
| 2 | System 1 / GPT-OSS 120B | 421 tokens/s | — | Imported | 2026-05-27 |
| 3 | System 2 / GPT-OSS 120B | 376.2 tokens/s | — | Imported | 2026-05-27 |
| 4 | System 7 / Llama-3.1-70B | 241.9 tokens/s | — | Imported | 2026-05-27 |
| 5 | System 8 / Llama-3.1-70B | 215.9 tokens/s | — | Imported | 2026-05-27 |
| 6 | System 4 / QWEN3 CODER 480B | 199.8 tokens/s | — | Imported | 2026-05-27 |
| 7 | System 3 / QWEN3 CODER 480B | 192.4 tokens/s | — | Imported | 2026-05-27 |
| 8 | System 10 / Deepseek-R1 | 68.2 tokens/s | — | Imported | 2026-05-27 |
| 9 | System 5 / Llama-3.1-8B | 67.2 tokens/s | — | Imported | 2026-05-27 |
| 10 | System 6 / Llama-3.1-8B | 34.3 tokens/s | — | Imported | 2026-05-27 |
No matching rows.