SciKnowEval

Scientific knowledge evaluation benchmark spanning biology, chemistry, materials science, and physics tasks.

26rows
inverse_overall_rankprimary metric
2026-05-27sampled

Metadata

Metrics

Inverse Overall Rank, Overall Rank (lower is better), Mean Domain Score (lower is better), Biology (lower is better), Chemistry (lower is better), Material (lower is better), Physics (lower is better)

Latest Results

Rows are parsed from the public SciKnowEval overall table. The source reports Overall as a rank; BenchmarkList score is the inverse rank so higher remains better.

Rank Subject Inverse Overall Rank Model Match Provenance Sampled
1 Claude-3.5-Sonnet-20240620 1 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
2 GPT-4o-2024-05-13 2 GPT-4o
openai-gpt-4o
Imported 2026-05-27
3 Qwen2-72B-Inst 3 Imported 2026-05-27
4 GPT-4-Turbo-2024-04-09 4 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-27
5 Gemini1.5-Pro-latest 5 Imported 2026-05-27
6 Llama3-70B-Inst 6 Imported 2026-05-27
7 Qwen-Max 7 Qwen-Max
qwen-qwen-max
Imported 2026-05-27
8 Claude3-Sonnet-20240229 8 Imported 2026-05-27
9 SciKnowMind-7b-v0.1 9 Imported 2026-05-27
10 Qwen2-7B-Inst 10 Imported 2026-05-27
11 Qwen1.5-14B-Chat 11 Imported 2026-05-27
12 GPT-3.5-Turbo-0125 12 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
13 Llama3-8B-Inst 13 Imported 2026-05-27
14 ChemDFM-13B 14 Imported 2026-05-27
15 ChemLLM-20B-Chat 15 Imported 2026-05-27
16 MolInst-Llama3-8B 16 Imported 2026-05-27
17 Qwen1.5-7B-Chat 17 Imported 2026-05-27
18 Gemma1.1-7B-Inst 18 Imported 2026-05-27
19 Mistral-7B-Inst-v0.2 19 Imported 2026-05-27
20 ChatGLM3-6B 20 Imported 2026-05-27
21 Galactica-30B 21 Imported 2026-05-27
22 Llama2-13B-Chat 22 Imported 2026-05-27
23 SciGLM-6B 23 Imported 2026-05-27
24 ChemLLM-7B-Chat 24 Imported 2026-05-27
25 Galactica-6.7B 25 Imported 2026-05-27
26 LlaSMol-Mistral-7B 26 Imported 2026-05-27