ChemBench
Chemistry benchmark leaderboard evaluating language models across analytical, general, inorganic, organic, physical, technical, materials, preference, and toxicity/safety chemistry categories.
54rows
overall_scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Overall Score, Analytical Chemistry, Chemical Preference, General Chemistry, Inorganic Chemistry, Materials Science, Organic Chemistry, Physical Chemistry, Technical Chemistry, Toxicity and Safety
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | NexusSciAgent | 0.86 | — | Imported | 2026-05-06 |
| 2 | NexusSciAgent_DeepSeek | 0.71 | — | Imported | 2026-05-06 |
| 3 | deepseek-chat1 | 0.66 | — | Imported | 2026-05-06 |
| 4 | Haiyue-Ch1 | 0.65 | — | Imported | 2026-05-06 |
| 5 | meta-llama/llama-4-maverick-17b-128e-instruct | 0.65 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-06 |
| 6 | o1-preview | 0.64 | o1-preview openai-o1-preview | Imported | 2026-05-06 |
| 7 | gemini-2.5-flash-preview-04-17 | 0.64 | — | Imported | 2026-05-06 |
| 8 | GPT-5-nano | 0.63 | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-06 |
| 9 | gpt-oss-120b | 0.63 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 10 | Claude-3.5 (Sonnet) | 0.63 | — | Imported | 2026-05-06 |
| 11 | DeepSeek-V3-0324-W8A8-1 | 0.63 | — | Imported | 2026-05-06 |
| 12 | Claude-3.5 (Sonnet) React | 0.62 | — | Imported | 2026-05-06 |
| 13 | deepseek-v3-1-w4a8-1 | 0.62 | — | Imported | 2026-05-06 |
| 14 | DeepSeek-V3-0324-MTP1 | 0.62 | — | Imported | 2026-05-06 |
| 15 | gpt-oss-20b | 0.61 | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-06 |
| 16 | GPT-4o | 0.61 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 17 | kimi-k2-instruct-0905 | 0.60 | MoonshotAI: Kimi K2 0905 moonshotai-kimi-k2-0905 | Imported | 2026-05-06 |
| 18 | Llama-3.1-405B-Instruct | 0.58 | — | Imported | 2026-05-06 |
| 19 | Mistral-Large-2 | 0.57 | — | Imported | 2026-05-06 |
| 20 | Claude-3 (Opus) | 0.57 | — | Imported | 2026-05-06 |
| 21 | PaperQA2 | 0.57 | — | Imported | 2026-05-06 |
| 22 | Llama-3.1-70B-Instruct | 0.53 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-06 |
| 23 | Qwen-2.5-32B | 0.53 | — | Imported | 2026-05-06 |
| 24 | Llama-3-70B-Instruct | 0.52 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 25 | Llama-3-70B-Instruct (Temperature 1.0) | 0.52 | Llama 3 70B Instruct meta-llama-llama-3-70b-instruct | Imported | 2026-05-06 |
| 26 | Llama-3.1-70B-Instruct (Temperature 1.0) | 0.51 | Llama 3.1 70B Instruct meta-llama-llama-3.1-70b-instruct | Imported | 2026-05-06 |
| 27 | GPT-4o React | 0.51 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 28 | GPT-4o-mini | 0.50 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 29 | Gemma-2-9B-it | 0.48 | — | Imported | 2026-05-06 |
| 30 | Gemma-2-9B-it (Temperature 1.0) | 0.48 | — | Imported | 2026-05-06 |
| 31 | Phi-3-Medium-4k-Instruct | 0.47 | — | Imported | 2026-05-06 |
| 32 | Claude-2 | 0.47 | — | Imported | 2026-05-06 |
| 33 | Llama-3.1-8B-Instruct | 0.47 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 34 | GPT-3.5 Turbo | 0.47 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 35 | ChemPile-LoRA-Ensemble | 0.47 | — | Imported | 2026-05-06 |
| 36 | Llama-3.1-8B-Instruct (Temperature 1.0) | 0.46 | Llama 3.1 8B Instruct meta-llama-llama-3.1-8b-instruct | Imported | 2026-05-06 |
| 37 | Llama-3-8B-Instruct (Temperature 1.0) | 0.46 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 38 | Llama-3-8B-Instruct | 0.46 | Llama 3 8B Instruct meta-llama-llama-3-8b-instruct | Imported | 2026-05-06 |
| 39 | Gemini-Pro | 0.45 | — | Imported | 2026-05-06 |
| 40 | Command-R+ | 0.45 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-06 |
| 41 | qwen2.5-instruct | 0.43 | — | Imported | 2026-05-06 |
| 42 | inspect_file_test | 0.43 | — | Imported | 2026-05-06 |
| 43 | inspect_test_2 | 0.43 | — | Imported | 2026-05-06 |
| 44 | Mixtral-8x7b-Instruct | 0.42 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 45 | kunlun-70b-1 | 0.42 | — | Imported | 2026-05-06 |
| 46 | Mixtral-8x7b-Instruct (Temperature 1.0) | 0.42 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 47 | GPT-4 | 0.41 | GPT-4 openai-gpt-4 | Imported | 2026-05-06 |
| 48 | Llama-2-70B Chat | 0.27 | — | Imported | 2026-05-06 |
| 49 | Llama-2-13B Chat | 0.26 | — | Imported | 2026-05-06 |
| 50 | olympiad | 0.22 | — | Imported | 2026-05-06 |
| 51 | Gemma-1.1-7B-it | 0.19 | — | Imported | 2026-05-06 |
| 52 | Gemma-1.1-7B-it (Temperature 1.0) | 0.19 | — | Imported | 2026-05-06 |
| 53 | Llama-2-7b-chat-hf | 0.04 | — | Imported | 2026-05-06 |
| 54 | Galactica-120b | 0.02 | — | Imported | 2026-05-06 |
No matching rows.