scBench

Bioinformatics agent benchmark with verifiable single-cell RNA-seq workflow tasks and deterministic graders.

20rows
accuracyprimary metric
2026-05-28sampled

Metadata

Metrics

Accuracy, Cost (lower is better)

Showing 2 latest source slices.

Latest Results

Provider-published system-card benchmark scores parsed from Anthropic's Claude Opus 4.8 capability evaluation tables. Rows are marked self-reported and should be interpreted as source claims unless independently reproduced.

Rank Subject Accuracy Model Match Provenance Sampled
1 Claude Mythos Preview 58.2% Claude Mythos Preview
anthropic-claude-mythos-preview
Self-reported 2026-05-28
2 Claude Opus 4.8 58.2% Claude Opus 4.8
anthropic-claude-opus-4.8
Self-reported 2026-05-28
3 Claude Opus 4.7 55.3% Claude Opus 4.7
anthropic-claude-opus-4.7
Self-reported 2026-05-28
4 Claude Sonnet 4.6 50.4% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Self-reported 2026-05-28
1 gpt-5.5 via mini-swe-agent 57.95% GPT-5.5
openai-gpt-5.5
Imported 2026-05-27
2 gpt-5.5 via openai-codex 57.78% GPT-5.5
openai-gpt-5.5
Imported 2026-05-27
3 gpt-5.4 via mini-swe-agent 57.44% GPT-5.4
openai-gpt-5.4
Imported 2026-05-27
4 claude-opus-4-7 via mini-swe-agent 55.21% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-27
5 claude-opus-4-7 via claude-code 54.02% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-27
6 gemini-3.1-pro-preview via mini-swe-agent 53.85% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-27
7 claude-opus-4-6 via mini-swe-agent 52.65% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-27
8 gpt-5.2 via mini-swe-agent 52.31% GPT-5.2
openai-gpt-5.2
Imported 2026-05-27
9 claude-sonnet-4-6 via mini-swe-agent 50.26% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-27
10 claude-opus-4-5 via mini-swe-agent 47.18% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-27
11 grok-4.20-beta-0309-reasoning via mini-swe-agent 44.44% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-27
12 grok-4.3 via mini-swe-agent 44.27% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-27
13 gpt-5.1 via mini-swe-agent 38.80% GPT-5.1
openai-gpt-5.1
Imported 2026-05-27
14 claude-sonnet-4-5 via mini-swe-agent 33.16% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-27
15 grok-4-1-fast-reasoning via mini-swe-agent 30.26% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-27
16 gemini-2.5-pro via mini-swe-agent 23.59% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27