VAREX-Bench

Multi-modal structured extraction benchmark over 1,777 government forms with per-document JSON schemas and image/text/spatial modalities.

20rows
exact_match_pctprimary metric
2026-05-28sampled

Metadata

Metrics

Exact Match, ANLS, Flat EM, Nested EM, Table EM, Perfect Documents

Latest Results

Rows are imported from the official VAREX homepage embedded JavaScript and ranked by Image modality Exact Match.

Rank Subject Exact Match Model Match Provenance Sampled
1 Gemini 2.5 Pro 98.0% EM Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
2 Gemini 2.5 Flash 97.3% EM Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
3 Qwen3-VL 96.6% EM Imported 2026-05-28
4 Llama 4 Maverick 95.6% EM Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-28
5 Ministral 94.8% EM Imported 2026-05-28
6 GPT-4o 94.8% EM GPT-4o
openai-gpt-4o
Imported 2026-05-28
7 Llama 4 Scout 94.3% EM Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-28
8 granite-vision-4.1-4b 94.2% EM Imported 2026-05-28
9 NuExtract 2.0 90.8% EM Imported 2026-05-28
10 InternVL3.5 85.6% EM Imported 2026-05-28
11 granite-4.0-3b-vision 85.5% EM Imported 2026-05-28
12 Qwen 2.5-VL 82.5% EM Imported 2026-05-28
13 Gemma 3n 71.0% EM Imported 2026-05-28
14 MiniCPM-V4 67.9% EM Imported 2026-05-28
15 Gemma 3 65.3% EM Imported 2026-05-28
16 h2oVL Miss. 61.3% EM Imported 2026-05-28
17 Qwen3-VL 34.2% EM Imported 2026-05-28
18 h2oVL Miss. 34.2% EM Imported 2026-05-28
19 InternVL3.5 28.2% EM Imported 2026-05-28
20 Qwen2-VL 9.7% EM Imported 2026-05-28