QFBench
Quantitative-finance agent benchmark with verifiable code-execution tasks for derivatives, risk, factors, and trading calculations.
9rows
pass_at_1primary metric
2026-05-27sampled
Metadata
Metrics
Pass@1, Pass@3
| Rank | Subject | Pass@1 | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.5 via codex-cli | 61.7% | — | Imported | 2026-05-27 |
| 2 | claude-opus-4-7 via claude-code | 61.2% | — | Imported | 2026-05-27 |
| 3 | GPT-5.3-codex via codex-cli | 60.8% | — | Imported | 2026-05-27 |
| 4 | claude-opus-4-6 via claude-code | 59.2% | — | Imported | 2026-05-27 |
| 5 | GPT-5.4 via codex-cli | 57.5% | — | Imported | 2026-05-27 |
| 6 | GPT-5.4-mini via codex-cli | 57.1% | — | Imported | 2026-05-27 |
| 7 | claude-sonnet-4-6 via claude-code | 56.3% | — | Imported | 2026-05-27 |
| 8 | claude-sonnet-4-5 via claude-code | 46.2% | — | Imported | 2026-05-27 |
| 9 | claude-haiku-4-5 via claude-code | 20.8% | — | Imported | 2026-05-27 |
No matching rows.