AudioAgentBench
Multi-turn voice-agent benchmark with appointment, assistant, conversation, event, grocery, and product workflows scored across multiple dimensions.
10rows
average_pass_rateprimary metric
2026-05-27sampled
Metadata
Metrics
Average Pass Rate, Run Count, Turns Scored, Real Run Count
| Rank | Subject | Average Pass Rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Ultravox V0.7 | 77.0199 | — | Imported | 2026-05-27 |
| 2 | Grok Voice Think Fast 1.0 | 75.897 | — | Imported | 2026-05-27 |
| 3 | Gpt Realtime | 73.4481 | — | Imported | 2026-05-27 |
| 4 | Grok Realtime | 73.1542 | — | Imported | 2026-05-27 |
| 5 | Gemini 3.1 Flash Live Preview | 73.0553 | — | Imported | 2026-05-27 |
| 6 | Gemini 2.5 Flash Native Audio Preview 12 2025 | 71.6957 | — | Imported | 2026-05-27 |
| 7 | Gpt Realtime 2 | 69.7261 | — | Imported | 2026-05-27 |
| 8 | Amazon.Nova 2 Sonic V1:0 | 55.3599 | — | Imported | 2026-05-27 |
| 9 | Glm Realtime Flash | 24.7454 | — | Imported | 2026-05-27 |
| 10 | Glm Realtime Air | 17.8349 | — | Imported | 2026-05-27 |
No matching rows.