VAREX-Bench
Multi-modal structured extraction benchmark over 1,777 government forms with per-document JSON schemas and image/text/spatial modalities.
20rows
exact_match_pctprimary metric
2026-05-28sampled
Metadata
Metrics
Exact Match, ANLS, Flat EM, Nested EM, Table EM, Perfect Documents
| Rank | Subject | Exact Match | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Pro | 98.0% EM | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 2 | Gemini 2.5 Flash | 97.3% EM | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 3 | Qwen3-VL | 96.6% EM | — | Imported | 2026-05-28 |
| 4 | Llama 4 Maverick | 95.6% EM | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-28 |
| 5 | Ministral | 94.8% EM | — | Imported | 2026-05-28 |
| 6 | GPT-4o | 94.8% EM | GPT-4o openai-gpt-4o | Imported | 2026-05-28 |
| 7 | Llama 4 Scout | 94.3% EM | Llama 4 Scout meta-llama-llama-4-scout | Imported | 2026-05-28 |
| 8 | granite-vision-4.1-4b | 94.2% EM | — | Imported | 2026-05-28 |
| 9 | NuExtract 2.0 | 90.8% EM | — | Imported | 2026-05-28 |
| 10 | InternVL3.5 | 85.6% EM | — | Imported | 2026-05-28 |
| 11 | granite-4.0-3b-vision | 85.5% EM | — | Imported | 2026-05-28 |
| 12 | Qwen 2.5-VL | 82.5% EM | — | Imported | 2026-05-28 |
| 13 | Gemma 3n | 71.0% EM | — | Imported | 2026-05-28 |
| 14 | MiniCPM-V4 | 67.9% EM | — | Imported | 2026-05-28 |
| 15 | Gemma 3 | 65.3% EM | — | Imported | 2026-05-28 |
| 16 | h2oVL Miss. | 61.3% EM | — | Imported | 2026-05-28 |
| 17 | Qwen3-VL | 34.2% EM | — | Imported | 2026-05-28 |
| 18 | h2oVL Miss. | 34.2% EM | — | Imported | 2026-05-28 |
| 19 | InternVL3.5 | 28.2% EM | — | Imported | 2026-05-28 |
| 20 | Qwen2-VL | 9.7% EM | — | Imported | 2026-05-28 |
No matching rows.