VoiceBench
Multifaceted benchmark for LLM-based voice assistants across speech-input knowledge, instruction following, safety, robustness, and accents/noise.
40rows
overallprimary metric
2026-05-27sampled
Metadata
Metrics
AlpacaEval, CommonEval, WildVoice, SD-QA, MMSU, OBQA, BBH, IFEval, AdvBench, Overall
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | NVIDIA Nemotron 3 Nano Omni 30B A3B | 89.39 | Nemotron 3 Nano Omni nvidia-nemotron-3-nano-omni-30b-a3b-reasoning | Imported | 2026-05-27 |
| 2 | Ultravox-GLM-4P7 | 88.86 | — | Imported | 2026-05-27 |
| 3 | Ultravox-GLM-4P7 (thinking) | 88.79 | — | Imported | 2026-05-27 |
| 4 | Whisper-v3-large + GPT-4o | 87.8 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
| 5 | Ultravox-GLM-4P6 | 87.05 | — | Imported | 2026-05-27 |
| 6 | GPT-4o-Audio | 86.75 | GPT-4o Audio openai-gpt-4o-audio-preview | Imported | 2026-05-27 |
| 7 | GPT-4o-mini-Audio | 82.84 | GPT-4 openai-gpt-4 | Imported | 2026-05-27 |
| 8 | Ultravox-v0.6-LLaMA-3.3-70B | 81.81 | — | Imported | 2026-05-27 |
| 9 | Parakeet-TDT-0.6b-V2 + Qwen3-8B | 79.23 | — | Imported | 2026-05-27 |
| 10 | Whisper-v3-large + LLaMA-3.1-8B | 77.48 | — | Imported | 2026-05-27 |
| 11 | Kimi-Audio | 76.91 | — | Imported | 2026-05-27 |
| 12 | Whisper-v3-turbo + LLaMA-3.1-8B | 76.09 | — | Imported | 2026-05-27 |
| 13 | Ultravox-v0.5-LLaMA-3.1-8B | 74.86 | — | Imported | 2026-05-27 |
| 14 | Ultravox-v0.4.1-LLaMA-3.1-8B | 72.09 | — | Imported | 2026-05-27 |
| 15 | Baichuan-Omni-1.5 | 71.32 | — | Imported | 2026-05-27 |
| 16 | MiniCPM-o | 71.23 | — | Imported | 2026-05-27 |
| 17 | Whisper-v3-turbo + LLaMA-3.2-3B | 71.02 | — | Imported | 2026-05-27 |
| 18 | Baichuan-Audio | 69.27 | — | Imported | 2026-05-27 |
| 19 | MERaLiON | 65.04 | — | Imported | 2026-05-27 |
| 20 | VITA-1.5 | 64.53 | — | Imported | 2026-05-27 |
| 21 | Phi-4-multimodal | 64.32 | — | Imported | 2026-05-27 |
| 22 | Ola | 59.42 | — | Imported | 2026-05-27 |
| 23 | Lyra-Base | 59 | — | Imported | 2026-05-27 |
| 24 | Nemotron 3 VoiceChat (V1) | 58.1 | — | Imported | 2026-05-27 |
| 25 | Ultravox-v0.5-LLaMA-3.2-1B | 57.46 | — | Imported | 2026-05-27 |
| 26 | DiVA | 57.39 | — | Imported | 2026-05-27 |
| 27 | GLM-4-Voice | 56.48 | — | Imported | 2026-05-27 |
| 28 | Qwen2-Audio | 55.8 | — | Imported | 2026-05-27 |
| 29 | Freeze-Omni | 55.2 | — | Imported | 2026-05-27 |
| 30 | Step-Audio | 50.84 | — | Imported | 2026-05-27 |
| 31 | Megrez-3B-Omni | 46.76 | — | Imported | 2026-05-27 |
| 32 | Ichigo | 45.57 | — | Imported | 2026-05-27 |
| 33 | Lyra-Mini | 45.26 | — | Imported | 2026-05-27 |
| 34 | Mair-hub-0.5B-Omni | 44.59 | — | Imported | 2026-05-27 |
| 35 | LLaMA-Omni | 41.12 | — | Imported | 2026-05-27 |
| 36 | VITA-1.0 | 36.43 | — | Imported | 2026-05-27 |
| 37 | SLAM-Omni | 35.3 | — | Imported | 2026-05-27 |
| 38 | Mini-Omni2 | 33.49 | — | Imported | 2026-05-27 |
| 39 | Mini-Omni | 30.42 | — | Imported | 2026-05-27 |
| 40 | Moshi | 29.51 | — | Imported | 2026-05-27 |
No matching rows.