GeneBench

GeneBench is an evaluation focused on multi-stage scientific data analysis in genetics and quantitative biology. Tasks require reasoning about ambiguous or noisy data with minimal supervisory guidance, addressing realistic obstacles such as hidden confounders or QC failures, and correctly implementing and interpreting modern statistical methods.

6rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Showing 2 latest source slices.

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 GPT-5.5 Pro 0.33 GPT-5.5 Pro
openai-gpt-5.5-pro
Self-reported 2026-05-06
2 GPT-5.5 0.25 GPT-5.5
openai-gpt-5.5
Self-reported 2026-05-06
1 GPT-5.5 Pro 33.2% GPT-5.5 Pro
openai-gpt-5.5-pro
Launch post 2026-04-23
2 GPT-5.4 Pro 25.6% GPT-5.4 Pro
openai-gpt-5.4-pro
Launch post 2026-04-23
3 GPT-5.5 25% GPT-5.5
openai-gpt-5.5
Launch post 2026-04-23
4 GPT-5.4 19% GPT-5.4
openai-gpt-5.4
Launch post 2026-04-23