HallusionBench
HallusionBench evaluates multimodal large language models on visual illusion and hallucination-style image-text reasoning cases.
9rows
overallprimary metric
2026-05-06sampled
Metadata
Metrics
Overall, aAcc, fAcc, qAcc
| Rank | Subject | Overall | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | MiMo-VL-7B | 63.80 | — | Imported | 2026-05-06 |
| 2 | BlueLM-2.6-3B | 63.10 | — | Imported | 2026-05-06 |
| 4 | BlueLM-2.5-3B | 60 | — | Imported | 2026-05-06 |
| 3 | R-4B | 60 | — | Imported | 2026-05-06 |
| 5 | Kimi-VL-A3B-Thinking-2506 | 59.80 | — | Imported | 2026-05-06 |
| 6 | nano V2 | 57.20 | — | Imported | 2026-05-06 |
| 7 | Ovis2-16B | 56.80 | — | Imported | 2026-05-06 |
| 8 | Ovis2-8B | 56.30 | — | Imported | 2026-05-06 |
| 9 | InternVL3-14B | 55.90 | — | Imported | 2026-05-06 |
No matching rows.