ChemBench

Chemistry benchmark leaderboard evaluating language models across analytical, general, inorganic, organic, physical, technical, materials, preference, and toxicity/safety chemistry categories.

54rows
overall_scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Overall Score, Analytical Chemistry, Chemical Preference, General Chemistry, Inorganic Chemistry, Materials Science, Organic Chemistry, Physical Chemistry, Technical Chemistry, Toxicity and Safety

Latest Results

Rows are parsed from the public Hugging Face dataset-server rows API for the latest ChemBench-Results split. Source model names and IDs are preserved.

Rank Subject Overall Score Model Match Provenance Sampled
1 NexusSciAgent 0.86 Imported 2026-05-06
2 NexusSciAgent_DeepSeek 0.71 Imported 2026-05-06
3 deepseek-chat1 0.66 Imported 2026-05-06
4 Haiyue-Ch1 0.65 Imported 2026-05-06
5 meta-llama/llama-4-maverick-17b-128e-instruct 0.65 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-06
6 o1-preview 0.64 o1-preview
openai-o1-preview
Imported 2026-05-06
7 gemini-2.5-flash-preview-04-17 0.64 Imported 2026-05-06
8 GPT-5-nano 0.63 GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-06
9 gpt-oss-120b 0.63 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
10 Claude-3.5 (Sonnet) 0.63 Imported 2026-05-06
11 DeepSeek-V3-0324-W8A8-1 0.63 Imported 2026-05-06
12 Claude-3.5 (Sonnet) React 0.62 Imported 2026-05-06
13 deepseek-v3-1-w4a8-1 0.62 Imported 2026-05-06
14 DeepSeek-V3-0324-MTP1 0.62 Imported 2026-05-06
15 gpt-oss-20b 0.61 gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-06
16 GPT-4o 0.61 GPT-4o
openai-gpt-4o
Imported 2026-05-06
17 kimi-k2-instruct-0905 0.60 KIMI MoonshotAI: Kimi K2 0905
moonshotai-kimi-k2-0905
Imported 2026-05-06
18 Llama-3.1-405B-Instruct 0.58 Imported 2026-05-06
19 Mistral-Large-2 0.57 Imported 2026-05-06
20 Claude-3 (Opus) 0.57 Imported 2026-05-06
21 PaperQA2 0.57 Imported 2026-05-06
22 Llama-3.1-70B-Instruct 0.53 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-06
23 Qwen-2.5-32B 0.53 Imported 2026-05-06
24 Llama-3-70B-Instruct 0.52 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-06
25 Llama-3-70B-Instruct (Temperature 1.0) 0.52 Llama 3 70B Instruct
meta-llama-llama-3-70b-instruct
Imported 2026-05-06
26 Llama-3.1-70B-Instruct (Temperature 1.0) 0.51 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-06
27 GPT-4o React 0.51 GPT-4o
openai-gpt-4o
Imported 2026-05-06
28 GPT-4o-mini 0.50 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-06
29 Gemma-2-9B-it 0.48 Imported 2026-05-06
30 Gemma-2-9B-it (Temperature 1.0) 0.48 Imported 2026-05-06
31 Phi-3-Medium-4k-Instruct 0.47 Imported 2026-05-06
32 Claude-2 0.47 Imported 2026-05-06
33 Llama-3.1-8B-Instruct 0.47 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-06
34 GPT-3.5 Turbo 0.47 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
35 ChemPile-LoRA-Ensemble 0.47 Imported 2026-05-06
36 Llama-3.1-8B-Instruct (Temperature 1.0) 0.46 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-06
37 Llama-3-8B-Instruct (Temperature 1.0) 0.46 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
38 Llama-3-8B-Instruct 0.46 Llama 3 8B Instruct
meta-llama-llama-3-8b-instruct
Imported 2026-05-06
39 Gemini-Pro 0.45 Imported 2026-05-06
40 Command-R+ 0.45 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-06
41 qwen2.5-instruct 0.43 Imported 2026-05-06
42 inspect_file_test 0.43 Imported 2026-05-06
43 inspect_test_2 0.43 Imported 2026-05-06
44 Mixtral-8x7b-Instruct 0.42 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
45 kunlun-70b-1 0.42 Imported 2026-05-06
46 Mixtral-8x7b-Instruct (Temperature 1.0) 0.42 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
47 GPT-4 0.41 GPT-4
openai-gpt-4
Imported 2026-05-06
48 Llama-2-70B Chat 0.27 Imported 2026-05-06
49 Llama-2-13B Chat 0.26 Imported 2026-05-06
50 olympiad 0.22 Imported 2026-05-06
51 Gemma-1.1-7B-it 0.19 Imported 2026-05-06
52 Gemma-1.1-7B-it (Temperature 1.0) 0.19 Imported 2026-05-06
53 Llama-2-7b-chat-hf 0.04 Imported 2026-05-06
54 Galactica-120b 0.02 Imported 2026-05-06