QFBench

Quantitative-finance agent benchmark with verifiable code-execution tasks for derivatives, risk, factors, and trading calculations.

9rows
pass_at_1primary metric
2026-05-27sampled

Metadata

Metrics

Pass@1, Pass@3

Latest Results

Rows parsed from the public QFBench homepage. Rows are agent/scaffold results, not bare base-model evaluations.

Rank Subject Pass@1 Model Match Provenance Sampled
1 GPT-5.5 via codex-cli 61.7% Imported 2026-05-27
2 claude-opus-4-7 via claude-code 61.2% Imported 2026-05-27
3 GPT-5.3-codex via codex-cli 60.8% Imported 2026-05-27
4 claude-opus-4-6 via claude-code 59.2% Imported 2026-05-27
5 GPT-5.4 via codex-cli 57.5% Imported 2026-05-27
6 GPT-5.4-mini via codex-cli 57.1% Imported 2026-05-27
7 claude-sonnet-4-6 via claude-code 56.3% Imported 2026-05-27
8 claude-sonnet-4-5 via claude-code 46.2% Imported 2026-05-27
9 claude-haiku-4-5 via claude-code 20.8% Imported 2026-05-27