AudioMC - Text Output

AudioMultiChallenge Text Output track benchmarks spoken dialogue systems that produce text responses across multi-turn interactions.

16rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 gemini-3-pro-preview (Thinking) 54.65 Gemini 3
google-gemini-3
Imported 2026-05-06
1 gemini-2.5-pro (Thinking) 46.90 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
2 gemini-2.5-flash (Thinking) 40.04 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
4 gpt-realtime-1.5 29.87 Imported 2026-05-06
4 Voxtral-Small-24B-2507 26.33 Mistral: Voxtral Small 24B 2507
mistralai-voxtral-small-24b-2507
Imported 2026-05-06
4 gemini-2.5-flash 26.11 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
4 gpt-4o-audio-preview-2025-06-03 25.44 GPT-4o Audio
openai-gpt-4o-audio-preview
Imported 2026-05-06
4 gpt-realtime-2025-08-28 23.45 Imported 2026-05-06
5 MiMo-Audio-7B-Instruct (Thinking) 19.69 Imported 2026-05-06
6 MiMo-Audio-7B-Instruct 18.58 Imported 2026-05-06
8 gpt-realtime-mini-2025-12-15 16.59 Imported 2026-05-06
9 gemma-3n-E4B-it 15.49 Gemma 3n 4B
google-gemma-3n-e4b-it
Imported 2026-05-06
9 Phi-4-multimodal-instruct 15.49 Imported 2026-05-06
9 gpt-4o-mini-audio-preview-2024-12-17 14.82 GPT-4
openai-gpt-4
Imported 2026-05-06
9 Kimi-Audio-7B-Instruct 13.72 Imported 2026-05-06
11 Qwen2.5-Omni-7B 11.95 Imported 2026-05-06