FinanceArena

FinanceQA leaderboard for industry-grade financial analysis tasks across basic tactical, assumption-based tactical, and conceptual categories.

19rows
overall_accuracyprimary metric
2026-05-27sampled

Metadata

Metrics

Overall Accuracy, Basic Tactical Accuracy, Assumption-Based Tactical Accuracy, Conceptual Accuracy

Latest Results

Rows parsed from FinanceArena's public consolidated model results CSV for the FinanceQA leaderboard.

Rank Subject Overall Accuracy Model Match Provenance Sampled
1 o3 54.1 o3
openai-o3
Imported 2026-05-27
2 Grok 4 49.3 GROK Grok 4
x-ai-grok-4
Imported 2026-05-27
3 o4 mini 48.6 o4 Mini
openai-o4-mini
Imported 2026-05-27
4 Gemini 2.5 Pro 45.3 Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-27
5 Llama 4 Maverick 44.6 Llama 4 Maverick
meta-llama-4-maverick
Imported 2026-05-27
6 Claude Opus 4 44.6 Claude Opus 4
anthropic-claude-opus-4
Imported 2026-05-27
7 Grok 3 44.6 GROK Grok 3
xaigrok-3
Imported 2026-05-27
8 Claude Sonnet 4 43.9 Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-27
9 Phi 4 Reasoning Plus 43.2 Imported 2026-05-27
10 DeepSeek-R1-0528 42.9 R1 0528
deepseek-deepseek-r1-0528
Imported 2026-05-27
11 Qwq 32b 42.6 Imported 2026-05-27
12 GPT-4.1 Mini 41.9 GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-27
13 Llama 3.1 Nemotron Ultra 253b V1 41.9 Imported 2026-05-27
14 Qwen3 30B A3B 37.2 Qwen3 30B A3B
qwen-qwen3-30b-a3b
Imported 2026-05-27
15 Kimi K2 33.8 KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-27
16 Gemini 2.5 Flash 32.4 Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-27
17 Magistral Medium 2506 31.8 Imported 2026-05-27
18 Command A 27.7 C Command A
cohere-command-a
Imported 2026-05-27
19 Nova Pro V1 20.3 Nova Pro 1.0
amazon-nova-pro-v1
Imported 2026-05-27