VAREX-Bench | BenchmarkList

Metadata

Exact Match, ANLS, Flat EM, Nested EM, Table EM, Perfect Documents

Rank	Subject	Exact Match	Model Match	Provenance	Sampled
1	Gemini 2.5 Pro	98.0% EM	Gemini 2.5 Pro google-gemini-2.5-pro	Imported	2026-05-28
2	Gemini 2.5 Flash	97.3% EM	Gemini 2.5 Flash google-gemini-2.5-flash	Imported	2026-05-28
3	Qwen3-VL	96.6% EM	—	Imported	2026-05-28
4	Llama 4 Maverick	95.6% EM	Llama 4 Maverick meta-llama-4-maverick	Imported	2026-05-28
5	Ministral	94.8% EM	—	Imported	2026-05-28
6	GPT-4o	94.8% EM	GPT-4o openai-gpt-4o	Imported	2026-05-28
7	Llama 4 Scout	94.3% EM	Llama 4 Scout meta-llama-llama-4-scout	Imported	2026-05-28
8	granite-vision-4.1-4b	94.2% EM	—	Imported	2026-05-28
9	NuExtract 2.0	90.8% EM	—	Imported	2026-05-28
10	InternVL3.5	85.6% EM	—	Imported	2026-05-28
11	granite-4.0-3b-vision	85.5% EM	—	Imported	2026-05-28
12	Qwen 2.5-VL	82.5% EM	—	Imported	2026-05-28
13	Gemma 3n	71.0% EM	—	Imported	2026-05-28
14	MiniCPM-V4	67.9% EM	—	Imported	2026-05-28
15	Gemma 3	65.3% EM	—	Imported	2026-05-28
16	h2oVL Miss.	61.3% EM	—	Imported	2026-05-28
17	Qwen3-VL	34.2% EM	—	Imported	2026-05-28
18	h2oVL Miss.	34.2% EM	—	Imported	2026-05-28
19	InternVL3.5	28.2% EM	—	Imported	2026-05-28
20	Qwen2-VL	9.7% EM	—	Imported	2026-05-28