AudioMC

AudioMultiChallenge benchmarks E2E spoken dialogue systems on multi-turn interaction, voice editing, and instruction retention.

30rows
scoreprimary metric
2026-05-07sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 gemini-3-pro-preview (Thinking)* 54.65 Gemini 3
google-gemini-3
Imported 2026-05-07
1 gpt-realtime-2 (xHigh) 48.45 Imported 2026-05-07
1 gemini-2.5-pro (Thinking)* 46.90 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-07
2 gemini-2.5-flash (Thinking)* 40.04 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-07
2 gpt-realtime-2 37.61 Imported 2026-05-07
3 gemini-3.1-flash-live-preview (Thinking)† 36.06 Imported 2026-05-07
3 gpt-realtime-1.5† 34.73 Imported 2026-05-07
4 gpt-realtime-1.5*\n 29.87 Imported 2026-05-07
5 gemini-3.1-flash-live-preview† 26.77 Imported 2026-05-07
5 Voxtral-Small-24B-2507* 26.33 Mistral: Voxtral Small 24B 2507
mistralai-voxtral-small-24b-2507
Imported 2026-05-07
6 gemini-2.5-flash* 26.11 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-07
6 gpt-4o-audio-preview-2025-06-03* 25.44 GPT-4o Audio
openai-gpt-4o-audio-preview
Imported 2026-05-07
6 Qwen3-Omni-30B-A3B-Instruct† 24.34 Imported 2026-05-07
6 gpt-realtime-2025-08-28* 23.45 Imported 2026-05-07
6 gpt-4o-audio-preview-2025-06-03† 23.23 GPT-4o Audio
openai-gpt-4o-audio-preview
Imported 2026-05-07
7 gemini-2.5-flash-native-audio-preview-12-2025 (thinking)† 21.46 Imported 2026-05-07
8 gpt-realtime-2025-08-28† 20.35 Imported 2026-05-07
8 MiMo-Audio-7B-Instruct (Thinking)* 19.69 Imported 2026-05-07
9 MiMo-Audio-7B-Instruct* 18.58 Imported 2026-05-07
10 gpt-realtime-mini-2025-12-15* 16.59 Imported 2026-05-07
11 gemma-3n-E4B-it* 15.49 Gemma 3n 4B
google-gemma-3n-e4b-it
Imported 2026-05-07
11 Phi-4-multimodal-instruct* 15.49 Imported 2026-05-07
11 gpt-4o-mini-audio-preview-2024-12-17* 14.82 GPT-4
openai-gpt-4
Imported 2026-05-07
12 gpt-realtime-mini-2025-12-15† 13.94 Imported 2026-05-07
13 gemini-2.5-flash-native-audio-preview-12-2025 (non-thinking)† 13.90 Imported 2026-05-07
13 Kimi-Audio-7B-Instruct* 13.72 Imported 2026-05-07
14 gpt-4o-mini-audio-preview-2024-12-17† 13.05 GPT-4
openai-gpt-4
Imported 2026-05-07
15 Qwen2.5-Omni-7B* 11.95 Imported 2026-05-07
15 Kimi-Audio-7B-Instruct† 10.40 Imported 2026-05-07
16 LFM2-Audio-1.5B† 9.29 Imported 2026-05-07