Professional Reasoning Bench - Legal
Professional Reasoning Bench Legal evaluates frontier LLMs on complex legal reasoning tasks drawn from real-world legal practice and case analysis.
28rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Confidence Interval Upper, Max Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Muse Spark | 52.29 | — | Imported | 2026-05-06 |
| 1 | claude-opus-4-6 (Non-Thinking) | 52.27 | Claude Opus 4.6 anthropic-claude-opus-4.6 | Imported | 2026-05-06 |
| 3 | gpt-5-pro | 49.89 | GPT-5 Pro openai-gpt-5-pro | Imported | 2026-05-06 |
| 3 | o3-pro | 49.67 | o3 Pro openai-o3-pro | Imported | 2026-05-06 |
| 3 | gpt-5.1-thinking | 49.33 | GPT-5.1 openai-gpt-5.1 | Imported | 2026-05-06 |
| 5 | gpt-5 | 48.96 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 5 | o3 | 48.57 | o3 openai-o3 | Imported | 2026-05-06 |
| 8 | gpt-5.2-pro-2025-12-11 | 45.44 | GPT-5.2 Pro openai-gpt-5.2-pro | Imported | 2026-05-06 |
| 9 | gpt-5.4 (High) | 44.35 | GPT-5.4 openai-gpt-5.4 | Imported | 2026-05-06 |
| 9 | claude-opus-4-5-20251101-thinking | 44.21 | Claude Opus 4.5 anthropic-claude-opus-4.5 | Imported | 2026-05-06 |
| 9 | gemini-3.1-pro | 44.02 | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-06 |
| 10 | kimi-k2.5 | 43.83 | MoonshotAI: Kimi K2.5 moonshotai-kimi-k2.5 | Imported | 2026-05-06 |
| 13 | gemini-2.5-pro | 41.43 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 13 | gemini-2.5-flash | 41.02 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-06 |
| 13 | claude-sonnet-4-5-20250929 | 40.84 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 13 | gpt-oss-120b | 40.21 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 14 | kimi-k2-thinking | 40.90 | MoonshotAI: Kimi K2 Thinking moonshotai-kimi-k2-thinking | Imported | 2026-05-06 |
| 14 | gemini-3-pro-preview | 40.60 | Gemini 3 google-gemini-3 | Imported | 2026-05-06 |
| 15 | mistral-medium-latest | 39.55 | — | Imported | 2026-05-06 |
| 18 | qwen.qwen3-235b-a22b-2507-v1:0 | 38.30 | — | Imported | 2026-05-06 |
| 18 | o4-mini | 38.11 | o4 Mini openai-o4-mini | Imported | 2026-05-06 |
| 19 | deepseek-v3p1 | 37.62 | DeepSeek V3.1 Terminus deepseek-deepseek-v3.1-terminus | Imported | 2026-05-06 |
| 22 | deepseek-r1-0528 | 36.61 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-06 |
| 23 | gpt-4.1 | 36.48 | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-06 |
| 23 | kimi-k2-instruct | 36.38 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-06 |
| 23 | claude-opus-4-1-20250805 | 34.00 | Claude Opus 4.1 anthropic-claude-opus-4.1 | Imported | 2026-05-06 |
| 27 | gpt-4.1-mini | 30.38 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-06 |
| 28 | llama4-maverick-instruct-basic | 24.84 | — | Imported | 2026-05-06 |
No matching rows.