PRBench Finance

Professional Reasoning Bench Finance evaluates frontier LLMs on complex financial reasoning tasks including analysis, modeling, and decision-making.

28rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Confidence Interval Upper, Max Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 claude-opus-4-6 (Non-Thinking) 53.28 Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-06
2 Muse Spark 52.44 Imported 2026-05-06
3 gpt-5 51.32 GPT-5
openai-gpt-5
Imported 2026-05-06
3 gpt-5-pro 51.06 GPT-5 Pro
openai-gpt-5-pro
Imported 2026-05-06
5 o3-pro 49.08 o3 Pro
openai-o3-pro
Imported 2026-05-06
6 gpt-5.1-thinking 48.01 GPT-5.1
openai-gpt-5.1
Imported 2026-05-06
6 o3 47.69 o3
openai-o3
Imported 2026-05-06
8 kimi-k2.5 46.51 KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-06
8 gpt-5.2-pro-2025-12-11 46.34 GPT-5.2 Pro
openai-gpt-5.2-pro
Imported 2026-05-06
8 claude-opus-4-5-20251101-thinking 46.16 Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-06
8 gpt-5.4 (High) 45.63 GPT-5.4
openai-gpt-5.4
Imported 2026-05-06
10 gpt-oss-120b 43.80 gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-06
12 claude-sonnet-4-5-20250929 43.79 Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-06
12 kimi-k2-thinking 43.41 KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-06
14 gemini-3.1-pro 41.87 Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-06
16 mistral-medium-latest 39.35 Imported 2026-05-06
16 o4-mini 39.22 o4 Mini
openai-o4-mini
Imported 2026-05-06
16 gemini-3-pro-preview 39.18 Gemini 3
google-gemini-3
Imported 2026-05-06
16 qwen.qwen3-235b-a22b-2507-v1:0 39.14 Imported 2026-05-06
16 gemini-2.5-pro 38.92 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-06
16 gemini-2.5-flash 38.41 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-06
16 kimi-k2-instruct 38.34 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-06
23 claude-opus-4-1-20250805 35.15 Claude Opus 4.1
anthropic-claude-opus-4.1
Imported 2026-05-06
23 deepseek-v3p1 35.09 DeepSeek V3.1 Terminus
deepseek-deepseek-v3.1-terminus
Imported 2026-05-06
24 gpt-4.1 34.32 GPT-4.1
openai-gpt-4.1
Imported 2026-05-06
26 deepseek-r1-0528 32.67 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-06
27 gpt-4.1-mini 30.45 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-06
28 llama4-maverick-instruct-basic 22.36 Imported 2026-05-06