LatencyGrid

Voice AI infrastructure latency benchmark covering LLM time-to-first-token, speech-to-text latency, text-to-speech time-to-first-byte, and pipeline combinations across providers.

28rows
ttft_p50primary metric
2026-05-06sampled

Metadata

Metrics

TTFT P50 (lower is better), TTFT P95 (lower is better), Throughput, Latency P50 (lower is better), TTFB P50 (lower is better), Realtime Factor

Showing 3 latest source slices.

Latest Results

Rows ranked by lowest TTFT P50.

Rank Subject TTFT P50 Model Match Provenance Sampled
1 groq/kimi-k2 109 Imported 2026-05-06
2 groq/llama-3.1-8b-instant 114 Imported 2026-05-06
3 groq/llama-4-scout-17b 166 Imported 2026-05-06
4 groq/llama-3.3-70b-versatile 327 Imported 2026-05-06
5 openai/gpt-4o 380 Imported 2026-05-06
6 anthropic/claude-haiku-4-5 463 Imported 2026-05-06
7 openai/gpt-4o-mini 508 Imported 2026-05-06
8 google/gemini-2.5-flash 936 Imported 2026-05-06
9 openai/o3-mini 956 Imported 2026-05-06
10 anthropic/claude-sonnet-4-5 1361 Imported 2026-05-06
1 groq/whisper-large-v3 360 Imported 2026-05-06
2 groq/whisper-large-v3-turbo 527 Imported 2026-05-06
3 deepgram/nova-3 583 Imported 2026-05-06
4 deepgram/nova-2 622 Imported 2026-05-06
5 openai/gpt-4o-transcribe 799 Imported 2026-05-06
6 assemblyai/universal-2 1725 Imported 2026-05-06
7 assemblyai/universal-3-pro 2855 Imported 2026-05-06
8 gladia/default 3452 Imported 2026-05-06
1 deepgram/aura-luna 156 Imported 2026-05-06
2 cartesia/sonic-2 200 Imported 2026-05-06
3 cartesia/sonic-english 210 Imported 2026-05-06
4 lmnt/aurora 298 Imported 2026-05-06
5 lmnt/blizzard 310 Imported 2026-05-06
6 elevenlabs/flash-v2.5 374 Imported 2026-05-06
7 openai/tts-1 1050 Imported 2026-05-06
8 elevenlabs/multilingual-v2 1277 Imported 2026-05-06
9 openai/tts-1-hd 2010 Imported 2026-05-06
10 fish-audio/default 2511 Imported 2026-05-06