FinChain

Financial chain-of-thought benchmark reporting ChainEval and lexical/semantic similarity metrics for general and finance-specific models.

25rows
chain_evalprimary metric
2026-05-28sampled

Metadata

Metrics

ChainEval, ChainEval Std (lower is better), ROUGE-R2, ROUGE-R2 Std (lower is better), ROUGE-RL, ROUGE-RL Std (lower is better), BERTScore, BERTScore Std (lower is better)

Latest Results

Rows are imported from the official FinChain static JavaScript data and ranked by ChainEval.

Rank Subject ChainEval Model Match Provenance Sampled
1 Gemini 2.5 Pro 58.65 ChainEval Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
2 Claude Sonnet 4.5 58.22 ChainEval Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
3 Claude Sonnet 4 58.18 ChainEval Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
4 Gemini 2.5 Flash 58.01 ChainEval Gemini 2.5 Flash
google-gemini-2.5-flash
Imported 2026-05-28
5 Claude Sonnet 3.7 57.89 ChainEval Imported 2026-05-28
6 GPT-5-mini 57.38 ChainEval GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
7 Fin-R1 57.34 ChainEval Imported 2026-05-28
8 GPT-4.1-mini 57.24 ChainEval GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-28
9 GPT-5 57.07 ChainEval GPT-5
openai-gpt-5
Imported 2026-05-28
10 Qwen 2.5 Instruct 57.00 ChainEval Imported 2026-05-28
11 GPT-4.1 56.92 ChainEval GPT-4.1
openai-gpt-4.1
Imported 2026-05-28
12 DeepSeek v3.1 56.76 ChainEval DeepSeek V3.1
deepseek-deepseek-chat-v3.1
Imported 2026-05-28
13 Grok 4 Fast 56.73 ChainEval GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
14 DeepSeek v3.2 56.71 ChainEval DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-28
15 Mathstral 56.40 ChainEval Imported 2026-05-28
16 Llama 3.1 Instruct 55.88 ChainEval Imported 2026-05-28
17 DeepSeek R1 53.75 ChainEval R1
deepseek-r1
Imported 2026-05-28
18 DianJin-R1 53.72 ChainEval Imported 2026-05-28
19 Qwen2.5-Math 50.32 ChainEval Imported 2026-05-28
20 Qwen 3 45.99 ChainEval Imported 2026-05-28
21 Finance-LlaMA 42.81 ChainEval Imported 2026-05-28
22 Fin-ol 39.34 ChainEval Imported 2026-05-28
23 Finance-Qwen 34.22 ChainEval Imported 2026-05-28
24 WizardMath 21.75 ChainEval Imported 2026-05-28
25 MetaMath 6.09 ChainEval Imported 2026-05-28