CharXiv-R
CharXiv-R is the reasoning component of the CharXiv benchmark, focusing on complex reasoning questions that require synthesizing information across visual chart elements. It evaluates multimodal large language models on their ability to understand and reason about scientific charts from arXiv papers through various reasoning tasks.
39rows
scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Score, Normalized Score
Showing 3 latest source slices.
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | 90.1% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-28 |
| 2 | Claude Opus 4.8 | 89.9% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 1 | Claude Mythos Preview | 0.93 | Claude Mythos Preview anthropic-claude-mythos-preview | Self-reported | 2026-05-06 |
| 2 | Claude Opus 4.7 | 0.91 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Self-reported | 2026-05-06 |
| 3 | Kimi K2.6 | 0.87 | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Self-reported | 2026-05-06 |
| 4 | Muse Spark | 0.86 | — | Self-reported | 2026-05-06 |
| 5 | GPT-5.2 | 0.82 | GPT-5.2 openai-gpt-5.2 | Self-reported | 2026-05-06 |
| 6 | Qwen3.6 Plus | 0.81 | Qwen3.6 Plus qwen-qwen3.6-plus | Self-reported | 2026-05-06 |
| 7 | Gemini 3 Pro | 0.81 | Gemini 3 google-gemini-3 | Self-reported | 2026-05-06 |
| 8 | GPT-5 | 0.81 | GPT-5 openai-gpt-5 | Self-reported | 2026-05-06 |
| 9 | Gemini 3 Flash | 0.80 | Gemini 3 Flash Preview google-gemini-3-flash-preview | Self-reported | 2026-05-06 |
| 10 | Qwen3.5-27B | 0.80 | Qwen3.5-27B qwen-qwen3.5-27b | Self-reported | 2026-05-06 |
| 11 | o3 | 0.79 | o3 openai-o3 | Self-reported | 2026-05-06 |
| 12 | Qwen3.6-27B | 0.78 | Qwen3.6 27B qwen-qwen3.6-27b | Self-reported | 2026-05-06 |
| 13 | Qwen3.6-35B-A3B | 0.78 | Qwen3.6 35B A3B qwen-qwen3.6-35b-a3b | Self-reported | 2026-05-06 |
| 14 | Qwen3.5-35B-A3B | 0.78 | Qwen3.5-35B-A3B qwen-qwen3.5-35b-a3b | Self-reported | 2026-05-06 |
| 14 | Kimi K2.5 | 0.78 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Self-reported | 2026-05-06 |
| 16 | Claude Opus 4.6 | 0.77 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Self-reported | 2026-05-06 |
| 17 | Qwen3.5-122B-A10B | 0.77 | Qwen3.5-122B-A10B qwen-qwen3.5-122b-a10b | Self-reported | 2026-05-06 |
| 18 | Gemini 3.1 Flash-Lite | 0.73 | Gemini 3.1 Flash Lite Preview google-gemini-3.1-flash-lite-preview | Self-reported | 2026-05-06 |
| 19 | o4-mini | 0.72 | o4 Mini openai-o4-mini | Self-reported | 2026-05-06 |
| 20 | Qwen3 VL 235B A22B Thinking | 0.66 | Qwen3 VL 235B A22B Thinking qwen-qwen3-vl-235b-a22b-thinking | Self-reported | 2026-05-06 |
| 21 | Qwen3 VL 32B Thinking | 0.65 | — | Self-reported | 2026-05-06 |
| 22 | Qwen3 VL 32B Instruct | 0.63 | Qwen3 VL 32B Instruct qwen-qwen3-vl-32b-instruct | Self-reported | 2026-05-06 |
| 23 | Qwen3 VL 235B A22B Instruct | 0.62 | Qwen3 VL 235B A22B Instruct qwen-qwen3-vl-235b-a22b-instruct | Self-reported | 2026-05-06 |
| 24 | GPT-4o | 0.59 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Self-reported | 2026-05-06 |
| 25 | GPT-4.1 mini | 0.57 | GPT-4.1 Mini openai-gpt-4.1-mini | Self-reported | 2026-05-06 |
| 26 | GPT-4.1 | 0.57 | GPT-4.1 openai-gpt-4.1 | Self-reported | 2026-05-06 |
| 27 | Qwen3 VL 30B A3B Thinking | 0.57 | Qwen3 VL 30B A3B Thinking qwen-qwen3-vl-30b-a3b-thinking | Self-reported | 2026-05-06 |
| 28 | GPT-4.5 | 0.55 | GPT-4.5 openai-gpt-4.5-preview | Self-reported | 2026-05-06 |
| 29 | Qwen3 VL 8B Thinking | 0.53 | Qwen3 VL 8B Thinking qwen-qwen3-vl-8b-thinking | Self-reported | 2026-05-06 |
| 30 | Qwen3 VL 4B Thinking | 0.50 | — | Self-reported | 2026-05-06 |
| 31 | Qwen3 VL 30B A3B Instruct | 0.49 | Qwen3 VL 30B A3B Instruct qwen-qwen3-vl-30b-a3b-instruct | Self-reported | 2026-05-06 |
| 32 | Qwen3 VL 8B Instruct | 0.46 | Qwen3 VL 8B Instruct qwen-qwen3-vl-8b-instruct | Self-reported | 2026-05-06 |
| 33 | GPT-4.1 nano | 0.41 | GPT-4.1 Nano openai-gpt-4.1-nano | Self-reported | 2026-05-06 |
| 34 | Qwen3 VL 4B Instruct | 0.40 | — | Self-reported | 2026-05-06 |
| 1 | Claude Mythos Preview | 93.2% | Claude Mythos Preview anthropic-claude-mythos-preview | Launch post | 2026-04-16 |
| 2 | Claude Opus 4.7 | 91% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Launch post | 2026-04-16 |
| 3 | Claude Opus 4.6 | 84.7% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Launch post | 2026-04-16 |
No matching rows.