OlympiadBench
OlympiadBench evaluates bilingual olympiad-level mathematical and physics reasoning, including multimodal and text-only problem settings.
17rows
full_avgprimary metric
2026-05-06sampled
Metadata
Metrics
Full Benchmark Math, Full Benchmark Physics, Full Benchmark Avg., Text-only Math, Text-only Physics, Text-only Avg.
| Rank | Subject | Full Benchmark Avg. | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o | 25.89 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 2 | GPT-4V | 17.97 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 3 | Qwen-VL-Max | 10.09 | Qwen VL Max qwen-qwen-vl-max | Imported | 2026-05-06 |
| 4 | Claude3-Opus | 7.65 | — | Imported | 2026-05-06 |
| 5 | Gemini-Pro-Vision | 4.22 | — | Imported | 2026-05-06 |
| 6 | Yi-VL-34B | 3.42 | — | Imported | 2026-05-06 |
| 7 | LLaVA-NeXT-34B | 3.65 | — | Imported | 2026-05-06 |
| 1 | GPT-4o | 39.72 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 2 | GPT-4 | 29.93 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 3 | GPT-4V | 29.07 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 4 | Qwen-VL-Max | 18.27 | Qwen VL Max qwen-qwen-vl-max | Imported | 2026-05-06 |
| 5 | Claude3-Opus | 13.09 | — | Imported | 2026-05-06 |
| 6 | Gemini-Pro-Vision | 7.34 | — | Imported | 2026-05-06 |
| 7 | Llama-3-70B-Instruct | 20.27 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 8 | DeepSeekMath-7B-RL | 17.02 | — | Imported | 2026-05-06 |
| 9 | Yi-VL-34B | 5.72 | — | Imported | 2026-05-06 |
| 10 | LLaVA-NeXT-34B | 5.87 | — | Imported | 2026-05-06 |
No matching rows.