CorpFin v2

A private benchmark evaluating understanding of long-context credit agreements

108rows
scoreprimary metric
2026-05-28sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Grok 4.3 68.532% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-28
2 GPT 5.5 68.415% GPT-5.5
openai-gpt-5.5
Imported 2026-05-28
3 Kimi K2.5 Thinking 68.259% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-28
4 Qwen 3 Max 2026-01-23 68.026% Imported 2026-05-28
5 Claude Opus 4.6 Thinking 67.016% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-28
6 Grok 4 Fast Reasoning 66.9% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
7 Kimi K2.6 Thinking 66.744% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-28
8 Claude Opus 4.8 66.706% Claude Opus 4.8
anthropic-claude-opus-4.8
Imported 2026-05-28
9 Qwen 3.6 Max Preview 66.473% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-28
10 Gemini 3 Flash Preview 66.434% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-28
11 Claude Opus 4.7 66.084% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-28
12 Grok 4.0709 66.045% GROK Grok 4
x-ai-grok-4
Imported 2026-05-28
13 Grok 4.1 Fast Reasoning 65.967% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
14 GPT 5.2 2025-12-11 65.889% GPT-5.2
openai-gpt-5.2
Imported 2026-05-28
15 Qwen 3.5 Plus Thinking 65.307% Imported 2026-05-28
16 Claude Sonnet 4.6 65.307% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-28
17 GPT 5.4 2026-03-05 65.268% GPT-5.4
openai-gpt-5.4
Imported 2026-05-28
18 Muse Spark 65.113% Imported 2026-05-28
19 Claude Opus 4.5 20251101 Thinking 65.074% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
20 Gemini 3.5 Flash 64.686% Gemini 3.5 Flash
google-gemini-3.5-flash
Imported 2026-05-28
21 Gemini 3.1 Pro Preview 64.491% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-28
22 GLM 5.1 Thinking 64.452% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-28
23 GPT 5.1 2025-11-13 63.831% GPT-5.1
openai-gpt-5.1
Imported 2026-05-28
24 Qwen 3.7 Max 63.714% Qwen3.7 Max
qwen-qwen3.7-max
Imported 2026-05-28
25 Gemini 3 Pro Preview 63.675% Gemini 3
google-gemini-3
Imported 2026-05-28
26 Grok 4.20 0309 Reasoning 63.675% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-28
27 Qwen 3.5 Flash 63.559% Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-28
28 GPT 4.1 2025-04-14 63.054% GPT-4.1
openai-gpt-4.1
Imported 2026-05-28
29 GLM 5 Thinking 62.898% GLM GLM 5
z-ai-glm-5
Imported 2026-05-28
30 Qwen 3.6 27B 62.315% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-28
31 Claude Sonnet 4.5 20250929 Thinking 61.966% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
32 Qwen 3.6 Plus 61.927% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-28
33 DeepSeek V4 Pro 61.383% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-28
34 Claude Opus 4.5 20251101 61.305% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-28
35 Claude Sonnet 4.20250514 Thinking 61.228% Imported 2026-05-28
36 MiniMax M2.7 61.189% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-28
37 GPT 5.4 Nano 2026-03-17 61.189% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-28
38 Grok 3 Mini Fast High Reasoning 61.111% Imported 2026-05-28
39 GPT 5.2025-08-07 61.072% GPT-5
openai-gpt-5
Imported 2026-05-28
40 Mistral Large 2512 61.033% Mistral: Mistral Large 3 2512
mistralai-mistral-large-2512
Imported 2026-05-28
41 GLM 4.5 60.956% GLM GLM 4.5
z-ai-glm-4.5
Imported 2026-05-28
42 GPT 5.4 Mini 2026-03-17 60.917% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-28
43 Claude Sonnet 4.5 20250929 60.8% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-28
44 Gemini 2.5 Pro 60.8% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-28
45 Claude Haiku 4.5 20251001 Thinking 60.606% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
46 Kimi K2 Thinking 60.567% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-28
47 Claude 3 7 Sonnet 20250219 Thinking 60.412% Imported 2026-05-28
48 Claude Haiku 4.5 20251001 60.295% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-28
49 GPT 5 Mini 2025-08-07 60.179% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-28
50 Gemini 2.5 Pro Exp 03 25 59.829% Imported 2026-05-28
51 Gemini 2.5 Flash Preview 09 2025 Thinking 59.751% Imported 2026-05-28
52 Grok 3 59.713% GROK Grok 3
xaigrok-3
Imported 2026-05-28
53 O3 2025-04-16 59.713% o3
openai-o3
Imported 2026-05-28
54 MiniMax M2.5 Lightning 59.596% Imported 2026-05-28
55 Grok 3 Mini Fast Low Reasoning 59.479% Imported 2026-05-28
56 Gemini 3.1 Flash Lite Preview 59.363% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-28
57 Gemini 2.5 Flash Preview 09 2025 58.974% Imported 2026-05-28
58 O4 Mini 2025-04-16 58.974% o4 Mini
openai-o4-mini
Imported 2026-05-28
59 MiniMax M2.1 58.896% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-28
60 Mistral Medium 3.5 58.78% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-28
61 Grok 4 Fast Non Reasoning 58.392% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-28
62 GPT Oss 120B 58.236% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-28
63 GPT 4.1 Mini 2025-04-14 57.926% GPT-4.1 Mini
openai-gpt-4.1-mini
Imported 2026-05-28
64 Gemini 2.5 Flash Lite Preview 09 2025 Thinking 57.576% Imported 2026-05-28
65 GLM 4.6 56.838% GLM GLM 4.6
z-ai-glm-4.6
Imported 2026-05-28
66 Gemini 2.5 Flash Lite Preview 09 2025 56.294% Gemini 2.5 Flash Lite Preview 09-2025
google-gemini-2.5-flash-lite-preview-09-2025
Imported 2026-05-28
67 Qwen 3 Max 55.944% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-28
68 DeepSeek V3 0324 54.74% DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-28
69 Claude Sonnet 4.20250514 54.701% Claude Sonnet 4
anthropic-claude-sonnet-4
Imported 2026-05-28
70 Trinity Large Thinking 54.662% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-28
71 Gemini 2.5 Flash Preview 04 17 54.157% Imported 2026-05-28
72 DeepSeek R1 54.118% R1
deepseek-r1
Imported 2026-05-28
73 Claude 3 5 Sonnet 20241022 53.613% Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-28
74 Command A Plus 05 2026 53.147% Imported 2026-05-28
75 GPT Oss 20B 53.147% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-28
76 Qwen 3 Max Preview 52.953% Imported 2026-05-28
77 DeepSeek V3 52.486% DeepSeek V3
deepseek-deepseek-chat
Imported 2026-05-28
78 Grok 4.1 Fast Non Reasoning 52.486% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-28
79 DeepSeek V3P1 51.476% Imported 2026-05-28
80 Grok 2.1212 51.126% Imported 2026-05-28
81 DeepSeek V3P2 Thinking 50.971% Imported 2026-05-28
82 Claude 3 5 Haiku 20241022 50.816% Imported 2026-05-28
83 Mistral Medium 2505 50.738% Imported 2026-05-28
84 Kimi K2 Instruct 50.388% KIMI MoonshotAI: Kimi K2 0711
moonshotai-kimi-k2
Imported 2026-05-28
85 Llama4 Maverick Instruct Basic 49.728% Imported 2026-05-28
86 DeepSeek V3P2 47.941% Imported 2026-05-28
87 Magistral Medium 2509 47.397% Imported 2026-05-28
88 Llama 4 Scout 17B 16E Instruct 46.776% Llama 4 Scout
meta-llama-llama-4-scout
Imported 2026-05-28
89 GLM 4.7 46.387% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-28
90 Command A 03 2025 45.96% C Command A
cohere-command-a
Imported 2026-05-28
91 GPT 4O 2024-11-20 45.921% GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-28
92 GPT 4O Mini 2024-07-18 45.455% GPT-4o-mini (2024-07-18)
openai-gpt-4o-mini-2024-07-18
Imported 2026-05-28
93 O3 Mini 2025-01-31 45.299% o3-mini
openai-o3-mini
Imported 2026-05-28
94 Mistral Small 2503 44.173% Imported 2026-05-28
95 Magistral Small 2509 44.017% Imported 2026-05-28
96 Gemini 2.0 Pro Exp 02 05 43.435% Imported 2026-05-28
97 GPT 4.1 Nano 2025-04-14 42.075% GPT-4.1 Nano
openai-gpt-4.1-nano
Imported 2026-05-28
98 Jamba Large 1.6 41.531% Imported 2026-05-28
99 Gemini 1.5 Pro 002 40.521% Imported 2026-05-28
100 Jamba 1.5 Large 39.433% Imported 2026-05-28
101 GPT 4O 2024-08-06 39.433% GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Imported 2026-05-28
102 Meta Llama 3.1 70B Instruct Turbo 38.85% Imported 2026-05-28
103 Gemini 1.5 Flash 002 38.19% Imported 2026-05-28
104 Jamba Mini 1.6 38.034% Imported 2026-05-28
105 Meta Llama 3.1 8B Instruct Turbo 37.801% Imported 2026-05-28
106 Jamba 1.5 Mini 33.877% Imported 2026-05-28
107 Gemini 2.0 Flash 001 33.722% Gemini 2.0 Flash
google-gemini-2.0-flash
Imported 2026-05-28
108 Gemini 1.5 Flash 001 28.633% Imported 2026-05-28