Professional Reasoning Bench - Legal

Professional Reasoning Bench Legal evaluates frontier LLMs on complex legal reasoning tasks drawn from real-world legal practice and case analysis.

28rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 Muse Spark 52.29 Imported 2026-05-06
1 claude-opus-4-6 (Non-Thinking) 52.27 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
3 gpt-5-pro 49.89 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-06
3 o3-pro 49.67 o3 Pro
openai-o3-pro
Imported 2026-05-06
3 gpt-5.1-thinking 49.33 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
5 gpt-5 48.96 GPT-5
openai-gpt-5
Imported 2026-05-06
5 o3 48.57 o3
openai-o3
Imported 2026-05-06
8 gpt-5.2-pro-2025-12-11 45.44 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-06
9 gpt-5.4 (High) 44.35 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
9 claude-opus-4-5-20251101-thinking 44.21 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
9 gemini-3.1-pro 44.02 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
10 kimi-k2.5 43.83 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
13 gemini-2.5-pro 41.43 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
13 gemini-2.5-flash 41.02 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
13 claude-sonnet-4-5-20250929 40.84 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
13 gpt-oss-120b 40.21 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
14 kimi-k2-thinking 40.90 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
14 gemini-3-pro-preview 40.60 Gemini 3
google-gemini-3
Imported 2026-05-06
15 mistral-medium-latest 39.55 Imported 2026-05-06
18 qwen.qwen3-235b-a22b-2507-v1:0 38.30 Imported 2026-05-06
18 o4-mini 38.11 o4 Mini
openai-o4-mini
Imported 2026-05-06
19 deepseek-v3p1 37.62 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
22 deepseek-r1-0528 36.61 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-06
23 gpt-4.1 36.48 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
23 kimi-k2-instruct 36.38 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-06
23 claude-opus-4-1-20250805 34.00 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
27 gpt-4.1-mini 30.38 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
28 llama4-maverick-instruct-basic 24.84 Imported 2026-05-06