ChartQA
ChartQA is a large-scale benchmark comprising 9.6K human-written questions and 23.1K questions generated from human-written chart summaries, designed to evaluate models' abilities in visual and logical reasoning over charts.
24rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet | 0.91 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Self-reported | 2026-05-06 |
| 2 | Llama 4 Maverick | 0.90 | Llama 4 Maverick meta-llama-4-maverick | Self-reported | 2026-05-06 |
| 3 | Qwen2.5 VL 72B Instruct | 0.90 | Qwen2.5 VL 72B Instruct qwen-qwen2.5-vl-72b-instruct | Self-reported | 2026-05-06 |
| 4 | Nova Pro | 0.89 | Nova Pro 1.0 amazon-nova-pro-v1 | Self-reported | 2026-05-06 |
| 5 | Llama 4 Scout | 0.89 | Llama 4 Scout meta-llama-llama-4-scout | Self-reported | 2026-05-06 |
| 6 | Qwen2-VL-72B-Instruct | 0.88 | — | Self-reported | 2026-05-06 |
| 7 | Pixtral Large | 0.88 | — | Self-reported | 2026-05-06 |
| 8 | Mistral Small 3.2 24B Instruct | 0.87 | Mistral: Mistral Small 3.2 24B mistralai-mistral-small-3.2-24b-instruct | Self-reported | 2026-05-06 |
| 9 | Qwen2.5 VL 7B Instruct | 0.87 | — | Self-reported | 2026-05-06 |
| 10 | Nova Lite | 0.87 | Nova Lite 1.0 amazon-nova-lite-v1 | Self-reported | 2026-05-06 |
| 11 | DeepSeek VL2 | 0.86 | — | Self-reported | 2026-05-06 |
| 12 | GPT-4o | 0.86 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Self-reported | 2026-05-06 |
| 13 | Llama 3.2 90B Instruct | 0.85 | — | Self-reported | 2026-05-06 |
| 14 | Qwen2.5-Omni-7B | 0.85 | — | Self-reported | 2026-05-06 |
| 15 | DeepSeek VL2 Small | 0.84 | — | Self-reported | 2026-05-06 |
| 16 | Llama 3.2 11B Instruct | 0.83 | — | Self-reported | 2026-05-06 |
| 17 | Pixtral-12B | 0.82 | — | Self-reported | 2026-05-06 |
| 17 | Phi-3.5-vision-instruct | 0.82 | — | Self-reported | 2026-05-06 |
| 19 | Phi-4-multimodal-instruct | 0.81 | — | Self-reported | 2026-05-06 |
| 20 | DeepSeek VL2 Tiny | 0.81 | — | Self-reported | 2026-05-06 |
| 21 | Gemma 3 27B | 0.78 | Gemma 3 27B google-gemma-3-27b-it | Self-reported | 2026-05-06 |
| 22 | Grok-1.5V | 0.76 | — | Self-reported | 2026-05-06 |
| 23 | Gemma 3 12B | 0.76 | Gemma 3 12B google-gemma-3-12b-it | Self-reported | 2026-05-06 |
| 24 | Gemma 3 4B | 0.69 | Gemma 3 4B google-gemma-3-4b-it | Self-reported | 2026-05-06 |
No matching rows.