Fin-RATE

Financial analytics benchmark over SEC filings covering detail/reasoning QA, enterprise comparison QA, and longitudinal tracking QA.

17rows
macro_accuracyprimary metric
2026-05-28sampled

Metadata

Metrics

Macro Accuracy, DR-QA Accuracy, EC-QA Accuracy, LT-QA Accuracy

Latest Results

Rows are imported from public arXiv source LaTeX. Primary macro accuracy is derived from the three task accuracies in the ground-truth-context table.

Rank Subject Macro Accuracy Model Match Provenance Sampled
1 GPT-5-websearch 43.37% GPT-5
openai-gpt-5
Imported 2026-05-28
2 GPT-4.1 33.24% GPT-4.1
openai-gpt-4.1
Imported 2026-05-28
3 GPT-4.1-websearch 31.80% GPT-4.1
openai-gpt-4.1
Imported 2026-05-28
4 Qwen3-235B 24.39% Qwen3 235B A22B
qwen-qwen3-235b-a22b
Imported 2026-05-28
5 Fin-R1 23.51% Imported 2026-05-28
6 GPT-OSS-20B 18.69% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-28
7 MIMO-V2-Flash 18.05% MiMo-V2-Flash
xiaomi-mimo-v2-flash
Imported 2026-05-28
8 Qwen3-30B-A3B-Instruct-2507 17.63% Qwen3 30B A3B Instruct 2507
qwen-qwen3-30b-a3b-instruct-2507
Imported 2026-05-28
9 Llama-3.3-70B-Instruct 16.76% Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-28
10 DeepSeek-V3.2 16.32% DeepSeek V3.2
deepseek-deepseek-v3.2
Imported 2026-05-28
11 DeepSeek-R1 15.53% R1
deepseek-r1
Imported 2026-05-28
12 Fino1-14B 13.13% Imported 2026-05-28
13 Qwen3-14B 11.25% Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-28
14 DeepSeek-V3 9.81% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-28
15 Qwen3-8B 5.48% Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-28
16 FinanceConnect-13B 2.65% Imported 2026-05-28
17 TouchstoneGPT-7B-Instruct 0.41% Imported 2026-05-28