FinChain
Financial chain-of-thought benchmark reporting ChainEval and lexical/semantic similarity metrics for general and finance-specific models.
25rows
chain_evalprimary metric
2026-05-28sampled
Metadata
Metrics
ChainEval, ChainEval Std (lower is better), ROUGE-R2, ROUGE-R2 Std (lower is better), ROUGE-RL, ROUGE-RL Std (lower is better), BERTScore, BERTScore Std (lower is better)
| Rank | Subject | ChainEval | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Gemini 2.5 Pro | 58.65 ChainEval | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 2 | Claude Sonnet 4.5 | 58.22 ChainEval | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-28 |
| 3 | Claude Sonnet 4 | 58.18 ChainEval | Claude Sonnet 4 anthropic-claude-sonnet-4 | Imported | 2026-05-28 |
| 4 | Gemini 2.5 Flash | 58.01 ChainEval | Gemini 2.5 Flash google-gemini-2.5-flash | Imported | 2026-05-28 |
| 5 | Claude Sonnet 3.7 | 57.89 ChainEval | — | Imported | 2026-05-28 |
| 6 | GPT-5-mini | 57.38 ChainEval | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 7 | Fin-R1 | 57.34 ChainEval | — | Imported | 2026-05-28 |
| 8 | GPT-4.1-mini | 57.24 ChainEval | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-28 |
| 9 | GPT-5 | 57.07 ChainEval | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 10 | Qwen 2.5 Instruct | 57.00 ChainEval | — | Imported | 2026-05-28 |
| 11 | GPT-4.1 | 56.92 ChainEval | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-28 |
| 12 | DeepSeek v3.1 | 56.76 ChainEval | DeepSeek V3.1 deepseek-deepseek-chat-v3.1 | Imported | 2026-05-28 |
| 13 | Grok 4 Fast | 56.73 ChainEval | Grok 4 Fast x-ai-grok-4-fast | Imported | 2026-05-28 |
| 14 | DeepSeek v3.2 | 56.71 ChainEval | DeepSeek V3.2 deepseek-deepseek-v3.2 | Imported | 2026-05-28 |
| 15 | Mathstral | 56.40 ChainEval | — | Imported | 2026-05-28 |
| 16 | Llama 3.1 Instruct | 55.88 ChainEval | — | Imported | 2026-05-28 |
| 17 | DeepSeek R1 | 53.75 ChainEval | R1 deepseek-r1 | Imported | 2026-05-28 |
| 18 | DianJin-R1 | 53.72 ChainEval | — | Imported | 2026-05-28 |
| 19 | Qwen2.5-Math | 50.32 ChainEval | — | Imported | 2026-05-28 |
| 20 | Qwen 3 | 45.99 ChainEval | — | Imported | 2026-05-28 |
| 21 | Finance-LlaMA | 42.81 ChainEval | — | Imported | 2026-05-28 |
| 22 | Fin-ol | 39.34 ChainEval | — | Imported | 2026-05-28 |
| 23 | Finance-Qwen | 34.22 ChainEval | — | Imported | 2026-05-28 |
| 24 | WizardMath | 21.75 ChainEval | — | Imported | 2026-05-28 |
| 25 | MetaMath | 6.09 ChainEval | — | Imported | 2026-05-28 |
No matching rows.