AudioAgentBench

Multi-turn voice-agent benchmark with appointment, assistant, conversation, event, grocery, and product workflows scored across multiple dimensions.

10rows
average_pass_rateprimary metric
2026-05-27sampled

Metadata

Metrics

Average Pass Rate, Run Count, Turns Scored, Real Run Count

Latest Results

Rows aggregate public AudioAgentBench run-level JSON API results by model across six static multi-turn voice-agent benchmarks.

Rank Subject Average Pass Rate Model Match Provenance Sampled
1 Ultravox V0.7 77.0199 Imported 2026-05-27
2 Grok Voice Think Fast 1.0 75.897 Imported 2026-05-27
3 Gpt Realtime 73.4481 Imported 2026-05-27
4 Grok Realtime 73.1542 Imported 2026-05-27
5 Gemini 3.1 Flash Live Preview 73.0553 Imported 2026-05-27
6 Gemini 2.5 Flash Native Audio Preview 12 2025 71.6957 Imported 2026-05-27
7 Gpt Realtime 2 69.7261 Imported 2026-05-27
8 Amazon.Nova 2 Sonic V1:0 55.3599 Imported 2026-05-27
9 Glm Realtime Flash 24.7454 Imported 2026-05-27
10 Glm Realtime Air 17.8349 Imported 2026-05-27