LAB-Bench
Biology research-assistant benchmark spanning literature QA, database QA, protocol understanding, figure interpretation, and related lab-research tasks.
2rows
open_response_mean_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Open-response mean accuracy, CloningScenarios accuracy, ProtocolQA accuracy, FigQA accuracy
| Rank | Subject | Open-response mean accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet | 0.266667 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-27 |
| 2 | GPT-4o | 0.233333 | GPT-4o openai-gpt-4o | Imported | 2026-05-27 |
No matching rows.