BLINK

BLINK: Multimodal Large Language Models Can See but Not Perceive. A benchmark for multimodal language models focusing on core visual perception abilities. Reformats 14 classic computer vision tasks into 3,807 multiple-choice questions paired with single or multiple images and visual prompting. Tasks include relative depth estimation, visual correspondence, forensics detection, multi-view reasoning, counting, object localization, and spatial reasoning that humans can solve 'within a blink'.

11rows

scoreprimary metric

2026-05-06sampled

Metadata

ID: blink
Category: Multimodal
Release: 2024-04-18
Source: Source page
Snapshot: Snapshot source
Post: Announcement post

Metrics

Score, Normalized Score

Rank	Subject	Score	Model Match	Provenance	Sampled
1	Qwen3 VL 235B A22B Instruct	0.71	Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct	Self-reported	2026-05-06
2	Qwen3 VL 8B Instruct	0.69	Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct	Self-reported	2026-05-06
3	Qwen3 VL 8B Thinking	0.69	Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking	Self-reported	2026-05-06
4	Qwen3 VL 32B Thinking	0.69	—	Self-reported	2026-05-06
5	Qwen3 VL 30B A3B Instruct	0.68	Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct	Self-reported	2026-05-06
6	Qwen3 VL 32B Instruct	0.67	Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct	Self-reported	2026-05-06
7	Qwen3 VL 235B A22B Thinking	0.67	Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking	Self-reported	2026-05-06
8	Qwen3 VL 4B Instruct	0.66	—	Self-reported	2026-05-06
9	Qwen3 VL 30B A3B Thinking	0.65	Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking	Self-reported	2026-05-06
10	Qwen3 VL 4B Thinking	0.63	—	Self-reported	2026-05-06
11	Phi-4-multimodal-instruct	0.61	—	Self-reported	2026-05-06

Metadata

Metrics

Latest Results