Realm Warren
micro1 legal reasoning benchmark set in realistic litigation, transactional, and compliance contexts, evaluating long-horizon legal work products with IRAC-decomposed rubrics.
3rows
mean_scoreprimary metric
2026-05-07sampled
Metadata
Metrics
Mean Weighted Reward, Pass@3, Median Weighted Reward
| Rank | Subject | Mean Weighted Reward | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | 0.36 | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-07 |
| 2 | GPT-5.5 | 0.35 | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-07 |
| 3 | Gemini 3.1 Pro | 0.22 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-07 |
No matching rows.