VLMbench Performance

Vision-language model throughput benchmark reporting peak tokens/sec, TTFT, TPOT, and worker count on an RTX PRO 6000 Blackwell vLLM setup.

5rows
best_toks_per_sprimary metric
2026-05-28sampled

Metadata

Metrics

Best Tok/s, Workers, TTFT (lower is better), TPOT (lower is better)

Latest Results

Rows are imported from the VLMbench README throughput leaderboard.

Rank Subject Best Tok/s Model Match Provenance Sampled
1 lightonai/LightOnOCR-2-1B 2439.8 tok/s Imported 2026-05-28
2 Qwen/Qwen3-VL-2B-Instruct 2409.3 tok/s Imported 2026-05-28
3 PaddlePaddle/PaddleOCR-VL 2341.9 tok/s Imported 2026-05-28
4 deepseek-ai/DeepSeek-OCR 1195.8 tok/s Imported 2026-05-28
5 Qwen/Qwen3-VL-8B-Instruct 953.8 tok/s Qwen3 VL 8B Instruct
qwen-qwen3-vl-8b-instruct
Imported 2026-05-28