BizFinBench

Business-driven financial benchmark covering anomalous event attribution, numerical computation, time reasoning, financial QA, event relation, stock prediction, and entity recognition.

25rows
averageprimary metric
2026-05-27sampled

Metadata

Metrics

Average, Anomalous Event Attribution, Financial Numerical Computation, Financial Time Reasoning, Financial Tool Use, Financial QA, Financial Data Description, Event Relation, Stock Prediction, Financial NER

Latest Results

Rows are parsed from the public BizFinBench README results table. Primary score is the reported Average column.

Rank Subject Average Model Match Provenance Sampled
1 ChatGPT-o3 73.86 Imported 2026-05-27
2 DeepSeek-R1 (671B) 73.05 R1
deepseek-r1
Imported 2026-05-27
3 GPT-4o 71.8 GPT-4o
openai-gpt-4o
Imported 2026-05-27
4 DeepSeek-V3 (671B) 71.57 DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-27
5 ChatGPT-o4-mini 71.29 Imported 2026-05-27
6 Gemini-2.0-Flash 69.75 Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-27
7 Qwen2.5-VL-32B 68.62 Imported 2026-05-27
8 Qwen3-32B 68.26 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-27
9 Qwen2.5-72B-Instruct 67.7 Qwen2.5 72B Instruct
qwen-qwen-2.5-72b-instruct
Imported 2026-05-27
10 Qwen2.5-VL-72B 67.53 Qwen2.5 VL 72B Instruct
qwen-qwen2.5-vl-72b-instruct
Imported 2026-05-27
11 Qwen3-14B 67.05 Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-27
12 DeepSeek-R1-Distill-Qwen-32B 66.29 R1 Distill Qwen 32B
deepseek-deepseek-r1-distill-qwen-32b
Imported 2026-05-27
13 QwQ-32B 65.77 Imported 2026-05-27
14 Claude-3.5-Sonnet 65.59 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-27
15 Llama 4 Scout 61.17 Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-27
16 Qwen3-4B 59.94 Imported 2026-05-27
17 DeepSeek-R1-Distill-Qwen-14B 59.49 Imported 2026-05-27
18 Qwen2.5-VL-7B 56.68 Imported 2026-05-27
19 Qwen2.5-7B-Instruct 56.35 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Imported 2026-05-27
20 Llama-3.1-70B-Instruct 55.09 Llama 3.1 70B Instruct
meta-llama-llama-3.1-70b-instruct
Imported 2026-05-27
21 Qwen3-1.7B 50.78 Imported 2026-05-27
22 Qwen2.5-VL-14B 49.52 Imported 2026-05-27
23 Llama-3.1-8B-Instruct 48.95 Llama 3.1 8B Instruct
meta-llama-llama-3.1-8b-instruct
Imported 2026-05-27
24 Xuanyuan3-70B 46.48 Imported 2026-05-27
25 Qwen2.5-VL-3B 38.96 Imported 2026-05-27