OlympiadBench

OlympiadBench evaluates bilingual olympiad-level mathematical and physics reasoning, including multimodal and text-only problem settings.

17rows
full_avgprimary metric
2026-05-06sampled

Metadata

Metrics

Full Benchmark Math, Full Benchmark Physics, Full Benchmark Avg., Text-only Math, Text-only Physics, Text-only Avg.

Latest Results

Rows preserve the README's full-benchmark and text-only tables as separate source settings. Ranks are source-table ranks.

Rank Subject Full Benchmark Avg. Model Match Provenance Sampled
1 GPT-4o 25.89 GPT-4o
openai-gpt-4o
Imported 2026-05-06
2 GPT-4V 17.97 GPT-4
openai-gpt-4
Imported 2026-05-06
3 Qwen-VL-Max 10.09 Qwen VL Max
qwen-qwen-vl-max
Imported 2026-05-06
4 Claude3-Opus 7.65 Imported 2026-05-06
5 Gemini-Pro-Vision 4.22 Imported 2026-05-06
6 Yi-VL-34B 3.42 Imported 2026-05-06
7 LLaVA-NeXT-34B 3.65 Imported 2026-05-06
1 GPT-4o 39.72 GPT-4o
openai-gpt-4o
Imported 2026-05-06
2 GPT-4 29.93 GPT-4
openai-gpt-4
Imported 2026-05-06
3 GPT-4V 29.07 GPT-4
openai-gpt-4
Imported 2026-05-06
4 Qwen-VL-Max 18.27 Qwen VL Max
qwen-qwen-vl-max
Imported 2026-05-06
5 Claude3-Opus 13.09 Imported 2026-05-06
6 Gemini-Pro-Vision 7.34 Imported 2026-05-06
7 Llama-3-70B-Instruct 20.27 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-06
8 DeepSeekMath-7B-RL 17.02 Imported 2026-05-06
9 Yi-VL-34B 5.72 Imported 2026-05-06
10 LLaVA-NeXT-34B 5.87 Imported 2026-05-06