FinanceArena
FinanceQA leaderboard for industry-grade financial analysis tasks across basic tactical, assumption-based tactical, and conceptual categories.
19rows
overall_accuracyprimary metric
2026-05-27sampled
Metadata
Metrics
Overall Accuracy, Basic Tactical Accuracy, Assumption-Based Tactical Accuracy, Conceptual Accuracy
| Rank | Subject | Overall Accuracy | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | o3 | 54.1 | o3 openai-o3 | Imported | 2026-05-27 |
| 2 | Grok 4 | 49.3 | Grok 4 x-ai-grok-4 | Imported | 2026-05-27 |
| 3 | o4 mini | 48.6 | o4 Mini openai-o4-mini | Imported | 2026-05-27 |
| 4 | Gemini 2.5 Pro | 45.3 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-27 |
| 5 | Llama 4 Maverick | 44.6 | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-27 |
| 6 | Claude Opus 4 | 44.6 | Claude Opus 4 anthropic-claude-opus-4 | Imported | 2026-05-27 |
| 7 | Grok 3 | 44.6 | Grok 3 xaigrok-3 | Imported | 2026-05-27 |
| 8 | Claude Sonnet 4 | 43.9 | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-27 |
| 9 | Phi 4 Reasoning Plus | 43.2 | — | Imported | 2026-05-27 |
| 10 | DeepSeek-R1-0528 | 42.9 | R1 0528 deepseek-deepseek-r1-0528 | Imported | 2026-05-27 |
| 11 | Qwq 32b | 42.6 | — | Imported | 2026-05-27 |
| 12 | GPT-4.1 Mini | 41.9 | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-27 |
| 13 | Llama 3.1 Nemotron Ultra 253b V1 | 41.9 | — | Imported | 2026-05-27 |
| 14 | Qwen3 30B A3B | 37.2 | Qwen3 30B A3B qwen-qwen3-30b-a3b | Imported | 2026-05-27 |
| 15 | Kimi K2 | 33.8 | MoonshotAI: Kimi K2 0711 moonshotai-kimi-k2 | Imported | 2026-05-27 |
| 16 | Gemini 2.5 Flash | 32.4 | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-27 |
| 17 | Magistral Medium 2506 | 31.8 | — | Imported | 2026-05-27 |
| 18 | Command A | 27.7 | Command A cohere-command-a | Imported | 2026-05-27 |
| 19 | Nova Pro V1 | 20.3 | Nova Pro 1.0 amazon-nova-pro-v1 | Imported | 2026-05-27 |
No matching rows.