OlympiadBench | BenchmarkList

Metadata

Full Benchmark Math, Full Benchmark Physics, Full Benchmark Avg., Text-only Math, Text-only Physics, Text-only Avg.

Rank	Subject	Full Benchmark Avg.	Model Match	Provenance	Sampled
1	GPT-4o	25.89	GPT-4o openai-gpt-4o	Imported	2026-05-06
2	GPT-4V	17.97	GPT-4 openai-gpt-4	Imported	2026-05-06
3	Qwen-VL-Max	10.09	Qwen VL Max qwen-qwen-vl-max	Imported	2026-05-06
4	Claude3-Opus	7.65	—	Imported	2026-05-06
5	Gemini-Pro-Vision	4.22	—	Imported	2026-05-06
6	Yi-VL-34B	3.42	—	Imported	2026-05-06
7	LLaVA-NeXT-34B	3.65	—	Imported	2026-05-06
1	GPT-4o	39.72	GPT-4o openai-gpt-4o	Imported	2026-05-06
2	GPT-4	29.93	GPT-4 openai-gpt-4	Imported	2026-05-06
3	GPT-4V	29.07	GPT-4 openai-gpt-4	Imported	2026-05-06
4	Qwen-VL-Max	18.27	Qwen VL Max qwen-qwen-vl-max	Imported	2026-05-06
5	Claude3-Opus	13.09	—	Imported	2026-05-06
6	Gemini-Pro-Vision	7.34	—	Imported	2026-05-06
7	Llama-3-70B-Instruct	20.27	Llama 3 70B Instruct meta-llama-llama-3-70b-instruct	Imported	2026-05-06
8	DeepSeekMath-7B-RL	17.02	—	Imported	2026-05-06
9	Yi-VL-34B	5.72	—	Imported	2026-05-06
10	LLaVA-NeXT-34B	5.87	—	Imported	2026-05-06