Rogo Big Finance Bench
Vendor-reported 928-question finance-agent benchmark spanning vertical-specific skills, metrics, financial-statement analysis, and forecasting workflows.
10rows
rubric_scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Rubric Score, Final-Answer Accuracy
| Rank | Subject | Rubric Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.7 | 59% rubric / 41% final | Claude Opus 4.7 anthropic-claude-opus-4.7 | Imported | 2026-05-28 |
| 2 | GPT-5.5 | 59% rubric / 44% final | GPT-5.5 openai-gpt-5.5 | Imported | 2026-05-28 |
| 3 | Claude Sonnet 4.6 | 59% rubric / 38% final | Claude Sonnet 4.6 anthropic-claude-sonnet-4.6 | Imported | 2026-05-28 |
| 4 | GLM 5.1 | 55% rubric / 36% final | GLM 5.1 z-ai-glm-5.1 | Imported | 2026-05-28 |
| 5 | Qwen 3.6 27B | 47% rubric / 30% final | Qwen3.6 27B qwen-qwen3.6-27b | Imported | 2026-05-28 |
| 6 | Kimi K2-6 | 45% rubric / 27% final | MoonshotAI: Kimi K2.6 moonshotai-kimi-k2.6 | Imported | 2026-05-28 |
| 7 | Gemini 3 Flash | 43% rubric / 26% final | Gemini 3 Flash Preview google-gemini-3-flash-preview | Imported | 2026-05-28 |
| 8 | Gemini 3.1 Pro | 41% rubric / 35% final | Gemini 3.1 Pro Preview google-gemini-3.1-pro-preview | Imported | 2026-05-28 |
| 9 | Gemma 4.5 1B | 35% rubric / 21% final | — | Imported | 2026-05-28 |
| 10 | GPT-5.4 Mini | 22% rubric / 7% final | GPT-5.4 Mini openai-gpt-5.4-mini | Imported | 2026-05-28 |
No matching rows.