VoiceAgentBench
Spoken tool-use agent benchmark for speech-in agents performing tool selection, parameter filling, orchestration, multi-turn handling, and safety checks.
6rows
english_parameter_filling_averageprimary metric
2026-05-27sampled
Metadata
Metrics
English PF average, English TS average, English TCS average, Multilingual PF average, Multilingual TS average, Multilingual TCS average, English refusal rate, Multilingual refusal rate
| Rank | Subject | English PF average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Whisperv3-Llama3 70B | 60.64% | — | Imported | 2026-05-27 |
| 2 | Whisperv3-Gemma3 27B | 59.28% | — | Imported | 2026-05-27 |
| 3 | KimiAudio 7B | 57.57% | — | Imported | 2026-05-27 |
| 4 | Whisperv3-Qwen3 8B | 56.26% | — | Imported | 2026-05-27 |
| 5 | AudioFlamingo3 7B | 19.71% | — | Imported | 2026-05-27 |
| 6 | Qwen2.5-Omni 7B | 1.7% | — | Imported | 2026-05-27 |
No matching rows.