SECQUE

SEC filing retrieval and question-answering benchmark for financial RAG over public company disclosures.

7rows
strict_accuracyprimary metric
2026-05-28sampled

Metadata

Metrics

Strict Accuracy, Normalized Accuracy, Financial Prompt Strict Accuracy, Financial Prompt Normalized Accuracy, Baseline CoT Strict Accuracy, Baseline CoT Normalized Accuracy, Financial CoT Strict Accuracy, Financial CoT Normalized Accuracy, Flipped Strict Accuracy, Flipped Normalized Accuracy, Average Tokens by Model (lower is better)

Latest Results

Rows are imported from public arXiv source LaTeX Table 5. Primary score uses the SECQUE baseline configuration strict accuracy; prompt-ablation strict and normalized accuracies are preserved as metrics.

Rank Subject Strict Accuracy Model Match Provenance Sampled
1 GPT-4o 0.69 GPT-4o
openai-gpt-4o
Imported 2026-05-28
2 Llama-3.3-70B-Instruct 0.65 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-28
3 GPT-4o-mini 0.64 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-28
4 Qwen2.5-32B-Instruct 0.61 Imported 2026-05-28
5 Phi-4 0.56 Phi 4
microsoft-phi-4
Imported 2026-05-28
6 Meta-Llama-3.1-8B-Instruct 0.48 Imported 2026-05-28
7 Mistral-Nemo-Instruct-2407 0.46 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-28