Roboflow Vision Evals - Visual Understanding

Roboflow Vision Evals benchmark for visual QA tasks such as reading text from photos, counting objects, spotting defects, and understanding documents.

5rows
score_pctprimary metric
2026-05-22sampled

Metadata

Metrics

Score, Passed, Avg Eval Time (lower is better)

Latest Results

Visible top rows captured from the rendered Roboflow Playground Vision Evals page. Roboflow reports 66 models and 67 prompts per model for this category; this snapshot includes the visible top rows captured in the scrape notes.

Rank Subject Score Model Match Provenance Sampled
1 Gemini 3.5 Flash 83.58% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-22
2 Gemini 3.1 Pro (Tools) 80.6% Gemini 3.1 Pro Preview Custom Tools
google-gemini-3.1-pro-preview-customtools
Imported 2026-05-22
3 Gemini 3 Flash (Tools) 79.1% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-22
4 Gemini 3.1 Pro 77.61% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-22
5 GPT-5.4 76.12% GPT-5.4
openai-gpt-5.4
Imported 2026-05-22