AudioMC - Audio Output

AudioMultiChallenge Audio Output track benchmarks spoken dialogue systems that produce audio responses in multi-turn conversations.

14rows
scoreprimary metric
2026-05-07sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Scale Labs announced GPT-Realtime-2 results for the Audio MultiChallenge S2S leaderboard: https://x.com/ScaleAILabs/status/2052451341071683732?s=20

Rank Subject Score Model Match Provenance Sampled
1 gpt-realtime-2 (xHigh) 48.45 Imported 2026-05-07
2 gpt-realtime-2 37.61 Imported 2026-05-07
2 gemini-3.1-flash-live-preview (Thinking) 36.06 Imported 2026-05-07
2 gpt-realtime-1.5 34.73 Imported 2026-05-07
3 gemini-3.1-flash-live-preview 26.77 Imported 2026-05-07
3 Qwen3-Omni-30B-A3B-Instruct 24.34 Imported 2026-05-07
3 gpt-4o-audio-preview-2025-06-03 23.23 GPT-4o Audio
openai-gpt-4o-audio-preview
Imported 2026-05-07
3 gemini-2.5-flash-native-audio-preview-12-2025 (thinking) 21.46 Imported 2026-05-07
3 gpt-realtime-2025-08-28 20.35 Imported 2026-05-07
4 gpt-realtime-mini-2025-12-15 13.94 Imported 2026-05-07
4 gemini-2.5-flash-native-audio-preview-12-2025 (non-thinking) 13.90 Imported 2026-05-07
5 gpt-4o-mini-audio-preview-2024-12-17 13.05 GPT-4
openai-gpt-4
Imported 2026-05-07
5 Kimi-Audio-7B-Instruct 10.40 Imported 2026-05-07
5 LFM2-Audio-1.5B 9.29 Imported 2026-05-07