PaperBench
End-to-end replication of state-of-the-art AI papers, graded against hierarchical rubrics.
1rows
replication_scoreprimary metric
2025-04-02sampled
Metadata
Metrics
Replication Score
| Rank | Subject | Replication Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet (New) + open-source scaffolding | 21.0% | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2025-04-02 |
No matching rows.