Harvey Legal Agent Benchmark
Open-source long-horizon legal agent benchmark where agents work from client-matter instructions, closed-universe matter files, and expert rubrics to produce reviewable legal work product.
6rows
all_pass_task_successprimary metric
2026-05-28sampled
Metadata
Metrics
All-Pass Task Success
Showing 2 latest source slices.
| Rank | Subject | All-Pass Task Success | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 | 9.6% | Claude Opus 4.8 anthropic-claude-opus-4.8 | Self-reported | 2026-05-28 |
| 1 | Claude Opus 4.7 | 7.1% | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 2 | Claude Sonnet 4.6 | 5.4% | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 3 | Claude Opus 4.6 | 4.2% | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-28 |
| 4 | GPT-5.5 | 2.1% | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 5 | Gemini 3.5 Flash | 0.8% | Gemini 3.5 Flash google-gemini-3.5-flash | Imported | 2026-05-28 |
No matching rows.