AudioMC
AudioMultiChallenge benchmarks E2E spoken dialogue systems on multi-turn interaction, voice editing, and instruction retention.
30rows
scoreprimary metric
2026-05-07sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview (Thinking)* | 54.65 | Gemini 3 google-gemini-3 | Imported | 2026-05-07 |
| 1 | gpt-realtime-2 (xHigh) | 48.45 | — | Imported | 2026-05-07 |
| 1 | gemini-2.5-pro (Thinking)* | 46.90 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-07 |
| 2 | gemini-2.5-flash (Thinking)* | 40.04 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-07 |
| 2 | gpt-realtime-2 | 37.61 | — | Imported | 2026-05-07 |
| 3 | gemini-3.1-flash-live-preview (Thinking)† | 36.06 | — | Imported | 2026-05-07 |
| 3 | gpt-realtime-1.5† | 34.73 | — | Imported | 2026-05-07 |
| 4 | gpt-realtime-1.5*\n | 29.87 | — | Imported | 2026-05-07 |
| 5 | gemini-3.1-flash-live-preview† | 26.77 | — | Imported | 2026-05-07 |
| 5 | Voxtral-Small-24B-2507* | 26.33 | Mistral: Voxtral Small 24B 2507 mistralai-voxtral-small-24b-2507 | Imported | 2026-05-07 |
| 6 | gemini-2.5-flash* | 26.11 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-07 |
| 6 | gpt-4o-audio-preview-2025-06-03* | 25.44 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-07 |
| 6 | Qwen3-Omni-30B-A3B-Instruct† | 24.34 | — | Imported | 2026-05-07 |
| 6 | gpt-realtime-2025-08-28* | 23.45 | — | Imported | 2026-05-07 |
| 6 | gpt-4o-audio-preview-2025-06-03† | 23.23 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-07 |
| 7 | gemini-2.5-flash-native-audio-preview-12-2025 (thinking)† | 21.46 | — | Imported | 2026-05-07 |
| 8 | gpt-realtime-2025-08-28† | 20.35 | — | Imported | 2026-05-07 |
| 8 | MiMo-Audio-7B-Instruct (Thinking)* | 19.69 | — | Imported | 2026-05-07 |
| 9 | MiMo-Audio-7B-Instruct* | 18.58 | — | Imported | 2026-05-07 |
| 10 | gpt-realtime-mini-2025-12-15* | 16.59 | — | Imported | 2026-05-07 |
| 11 | gemma-3n-E4B-it* | 15.49 | Gemma 3n 4B google-gemma-3n-e4b-it | Imported | 2026-05-07 |
| 11 | Phi-4-multimodal-instruct* | 15.49 | — | Imported | 2026-05-07 |
| 11 | gpt-4o-mini-audio-preview-2024-12-17* | 14.82 | GPT-4 openai-gpt-4 | Imported | 2026-05-07 |
| 12 | gpt-realtime-mini-2025-12-15† | 13.94 | — | Imported | 2026-05-07 |
| 13 | gemini-2.5-flash-native-audio-preview-12-2025 (non-thinking)† | 13.90 | — | Imported | 2026-05-07 |
| 13 | Kimi-Audio-7B-Instruct* | 13.72 | — | Imported | 2026-05-07 |
| 14 | gpt-4o-mini-audio-preview-2024-12-17† | 13.05 | GPT-4 openai-gpt-4 | Imported | 2026-05-07 |
| 15 | Qwen2.5-Omni-7B* | 11.95 | — | Imported | 2026-05-07 |
| 15 | Kimi-Audio-7B-Instruct† | 10.40 | — | Imported | 2026-05-07 |
| 16 | LFM2-Audio-1.5B† | 9.29 | — | Imported | 2026-05-07 |
No matching rows.