AudioMC - Audio Output
AudioMultiChallenge Audio Output track benchmarks spoken dialogue systems that produce audio responses in multi-turn conversations.
14rows
scoreprimary metric
2026-05-07sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-realtime-2 (xHigh) | 48.45 | — | Imported | 2026-05-07 |
| 2 | gpt-realtime-2 | 37.61 | — | Imported | 2026-05-07 |
| 2 | gemini-3.1-flash-live-preview (Thinking) | 36.06 | — | Imported | 2026-05-07 |
| 2 | gpt-realtime-1.5 | 34.73 | — | Imported | 2026-05-07 |
| 3 | gemini-3.1-flash-live-preview | 26.77 | — | Imported | 2026-05-07 |
| 3 | Qwen3-Omni-30B-A3B-Instruct | 24.34 | — | Imported | 2026-05-07 |
| 3 | gpt-4o-audio-preview-2025-06-03 | 23.23 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-07 |
| 3 | gemini-2.5-flash-native-audio-preview-12-2025 (thinking) | 21.46 | — | Imported | 2026-05-07 |
| 3 | gpt-realtime-2025-08-28 | 20.35 | — | Imported | 2026-05-07 |
| 4 | gpt-realtime-mini-2025-12-15 | 13.94 | — | Imported | 2026-05-07 |
| 4 | gemini-2.5-flash-native-audio-preview-12-2025 (non-thinking) | 13.90 | — | Imported | 2026-05-07 |
| 5 | gpt-4o-mini-audio-preview-2024-12-17 | 13.05 | GPT-4 openai-gpt-4 | Imported | 2026-05-07 |
| 5 | Kimi-Audio-7B-Instruct | 10.40 | — | Imported | 2026-05-07 |
| 5 | LFM2-Audio-1.5B | 9.29 | — | Imported | 2026-05-07 |
No matching rows.