Roboflow Vision Evals - Visual Understanding
Roboflow Vision Evals benchmark for visual QA tasks such as reading text from photos, counting objects, spotting defects, and understanding documents.
5rows
score_pctprimary metric
2026-05-22sampled
Metadata
Metrics
Score, Passed, Avg Eval Time (lower is better)
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 3.5 Flash | 83.58% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-22 |
| 2 | Gemini 3.1 Pro (Tools) | 80.6% | Gemini 3.1 Pro Preview Custom Tools google-gemini-3.1-pro-preview-customtools | Imported | 2026-05-22 |
| 3 | Gemini 3 Flash (Tools) | 79.1% | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-22 |
| 4 | Gemini 3.1 Pro | 77.61% | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-22 |
| 5 | GPT-5.4 | 76.12% | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-22 |
No matching rows.