HallusionBench

HallusionBench evaluates multimodal large language models on visual illusion and hallucination-style image-text reasoning cases.

9rows
overallprimary metric
2026-05-06sampled

Metadata

Metrics

Overall, aAcc, fAcc, qAcc

Latest Results

Snapshot mirrors the public HallusionBench leaderboard JSON. Source metric names are preserved in metadata; normalized metric keys are used only for registry consistency.

Rank Subject Overall Model Match Provenance Sampled
1 MiMo-VL-7B 63.80 Imported 2026-05-06
2 BlueLM-2.6-3B 63.10 Imported 2026-05-06
4 BlueLM-2.5-3B 60 Imported 2026-05-06
3 R-4B 60 Imported 2026-05-06
5 Kimi-VL-A3B-Thinking-2506 59.80 Imported 2026-05-06
6 nano V2 57.20 Imported 2026-05-06
7 Ovis2-16B 56.80 Imported 2026-05-06
8 Ovis2-8B 56.30 Imported 2026-05-06
9 InternVL3-14B 55.90 Imported 2026-05-06