AudioMC - Text Output
AudioMultiChallenge Text Output track benchmarks spoken dialogue systems that produce text responses across multi-turn interactions.
16rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview (Thinking) | 54.65 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 1 | gemini-2.5-pro (Thinking) | 46.90 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 2 | gemini-2.5-flash (Thinking) | 40.04 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 4 | gpt-realtime-1.5 | 29.87 | — | Imported | 2026-05-06 |
| 4 | Voxtral-Small-24B-2507 | 26.33 | Mistral: Voxtral Small 24B 2507 mistralai-voxtral-small-24b-2507 | Imported | 2026-05-06 |
| 4 | gemini-2.5-flash | 26.11 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 4 | gpt-4o-audio-preview-2025-06-03 | 25.44 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-06 |
| 4 | gpt-realtime-2025-08-28 | 23.45 | — | Imported | 2026-05-06 |
| 5 | MiMo-Audio-7B-Instruct (Thinking) | 19.69 | — | Imported | 2026-05-06 |
| 6 | MiMo-Audio-7B-Instruct | 18.58 | — | Imported | 2026-05-06 |
| 8 | gpt-realtime-mini-2025-12-15 | 16.59 | — | Imported | 2026-05-06 |
| 9 | gemma-3n-E4B-it | 15.49 | Gemma 3n 4B google-gemma-3n-e4b-it | Imported | 2026-05-06 |
| 9 | Phi-4-multimodal-instruct | 15.49 | — | Imported | 2026-05-06 |
| 9 | gpt-4o-mini-audio-preview-2024-12-17 | 14.82 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 9 | Kimi-Audio-7B-Instruct | 13.72 | — | Imported | 2026-05-06 |
| 11 | Qwen2.5-Omni-7B | 11.95 | — | Imported | 2026-05-06 |
No matching rows.