AHa-Bench | BenchmarkList

Metadata

Hallucination Accuracy, Hallucination Error Rate (lower is better)

Rank	Subject	Hallucination Accuracy	Model Match	Provenance	Sampled
1	Gemini-2.5-Pro	60%	Gemini 2.5 Pro google-gemini-2.5-pro	Imported	2026-05-28
2	GPT-Audio	28.75%	GPT Audio openai-gpt-audio	Imported	2026-05-28
3	Kimi-Audio	23.94%	—	Imported	2026-05-28
4	Qwen-Audio	22.46%	—	Imported	2026-05-28
5	Qwen2-Audio-Inst	20.73%	—	Imported	2026-05-28
6	FunAudioLLM	20.54%	—	Imported	2026-05-28
7	GLM4-Voice	16.42%	—	Imported	2026-05-28
8	Qwen2-Audio	16.15%	—	Imported	2026-05-28
9	SALMONN	7.76%	—	Imported	2026-05-28