AHa-Bench
Audio hallucination benchmark for large audio-language models across semantic, acoustic, and confusion hallucination types.
9rows
hallucination_accuracyprimary metric
2026-05-28sampled
Metadata
Metrics
Hallucination Accuracy, Hallucination Error Rate (lower is better)
| Rank | Subject | Hallucination Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini-2.5-Pro | 60% | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 2 | GPT-Audio | 28.75% | GPT Audio openai-gpt-audio | Imported | 2026-05-28 |
| 3 | Kimi-Audio | 23.94% | — | Imported | 2026-05-28 |
| 4 | Qwen-Audio | 22.46% | — | Imported | 2026-05-28 |
| 5 | Qwen2-Audio-Inst | 20.73% | — | Imported | 2026-05-28 |
| 6 | FunAudioLLM | 20.54% | — | Imported | 2026-05-28 |
| 7 | GLM4-Voice | 16.42% | — | Imported | 2026-05-28 |
| 8 | Qwen2-Audio | 16.15% | — | Imported | 2026-05-28 |
| 9 | SALMONN | 7.76% | — | Imported | 2026-05-28 |
No matching rows.