CaseLaw v2

Private question-answer benchmark over Canadian court-cases.

54rows
scoreprimary metric
2026-05-04sampled

Metadata

Metrics

Score, Std. error (lower is better), Latency (lower is better), Cost per test (lower is better)

Latest Results

Full leaderboard rows decoded from the Vals.ai benchmark detail page. Primary score is the Overall accuracy percentage.

Rank Subject Score Model Match Provenance Sampled
1 Grok 4.3 79.314% GROK Grok 4.3
x-ai-grok-4.3
Imported 2026-05-04
2 GPT 5.1 2025-11-13 73.419% GPT-5.1
openai-gpt-5.1
Imported 2026-05-04
3 GPT 4.1 2025-04-14 69.882% GPT-4.1
openai-gpt-4.1
Imported 2026-05-04
4 GPT 5 Mini 2025-08-07 68.489% GPT-5 Mini
openai-gpt-5-mini
Imported 2026-05-04
5 Claude Opus 4.7 68.381% Claude Opus 4.7
anthropic-claude-opus-4.7
Imported 2026-05-04
6 GPT 5.2025-08-07 66.452% GPT-5
openai-gpt-5
Imported 2026-05-04
7 GPT 5.5 66.238% GPT-5.5
openai-gpt-5.5
Imported 2026-05-04
8 GPT 5.2 2025-12-11 66.024% GPT-5.2
openai-gpt-5.2
Imported 2026-05-04
9 Grok 4.0709 65.809% GROK Grok 4
x-ai-grok-4
Imported 2026-05-04
10 Grok 4 Fast Reasoning 65.702% GROK Grok 4 Fast
x-ai-grok-4-fast
Imported 2026-05-04
11 Kimi K2 Thinking 65.702% KIMI MoonshotAI: Kimi K2 Thinking
moonshotai-kimi-k2-thinking
Imported 2026-05-04
12 Gemini 3.1 Pro Preview 64.845% Gemini 3.1 Pro Preview
google-gemini-3.1-pro-preview
Imported 2026-05-04
13 Command A 03 2025 64.523% C Command A
cohere-command-a
Imported 2026-05-04
14 Claude Sonnet 4.6 63.987% Claude Sonnet 4.6
anthropic-claude-sonnet-4.6
Imported 2026-05-04
15 Gemini 2.5 Pro 63.88% Gemini 2.5 Pro
google-gemini-2.5-pro
Imported 2026-05-04
16 GPT 5.4 2026-03-05 63.773% GPT-5.4
openai-gpt-5.4
Imported 2026-05-04
17 Muse Spark 63.13% Imported 2026-05-04
18 Claude Opus 4.5 20251101 Thinking 62.594% Claude Opus 4.5
anthropic-claude-opus-4.5
Imported 2026-05-04
19 Claude Sonnet 4.5 20250929 Thinking 62.165% Claude Sonnet 4.5
anthropic-claude-sonnet-4.5
Imported 2026-05-04
20 Claude Opus 4.6 Thinking 62.058% Claude Opus 4.6
anthropic-claude-opus-4.6
Imported 2026-05-04
21 Mistral Large 2512 61.415% Mistral: Mistral Large 3 2512
mistralai-mistral-large-2512
Imported 2026-05-04
22 Kimi K2.6 Thinking 61.201% KIMI MoonshotAI: Kimi K2.6
moonshotai-kimi-k2.6
Imported 2026-05-04
23 MiniMax M2.7 60.879% MiniMax M2.7
minimax-minimax-m2.7
Imported 2026-05-04
24 Grok 4.1 Fast Reasoning 60.45% GROK Grok 4.1 Fast
x-ai-grok-4.1-fast
Imported 2026-05-04
25 Qwen 3.5 Plus Thinking 59.7% Imported 2026-05-04
26 GPT 4O 2024-11-20 59.7% GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-04
27 DeepSeek V4 Pro 59.378% DeepSeek V4 Pro
deepseek-deepseek-v4-pro
Imported 2026-05-04
28 Kimi K2.5 Thinking 58.735% KIMI MoonshotAI: Kimi K2.5
moonshotai-kimi-k2.5
Imported 2026-05-04
29 Trinity Large Thinking 57.878% A Trinity Large Thinking
arcee-ai-trinity-large-thinking
Imported 2026-05-04
30 Claude Haiku 4.5 20251001 Thinking 56.484% Claude Haiku 4.5
anthropic-claude-haiku-4.5
Imported 2026-05-04
31 Qwen 3.5 Flash 55.948% Qwen3.5-Flash
qwen-qwen3.5-flash-02-23
Imported 2026-05-04
32 Gemini 3 Flash Preview 55.842% Gemini 3 Flash Preview
google-gemini-3-flash-preview
Imported 2026-05-04
33 MiniMax M2.1 55.842% MiniMax M2.1
minimax-minimax-m2.1
Imported 2026-05-04
34 DeepSeek V3P2 Thinking 55.412% Imported 2026-05-04
35 Qwen 3 Max 2026-01-23 54.984% Imported 2026-05-04
36 Gemini 3.1 Flash Lite Preview 54.984% Gemini 3.1 Flash Lite Preview
google-gemini-3.1-flash-lite-preview
Imported 2026-05-04
37 GLM 4.7 54.877% GLM GLM 4.7
z-ai-glm-4.7
Imported 2026-05-04
38 Grok 4.20 0309 Reasoning 54.448% GROK Grok 4.20
x-ai-grok-4.20
Imported 2026-05-04
39 DeepSeek V3P1 53.912% Imported 2026-05-04
40 MiniMax M2.5 Lightning 53.483% Imported 2026-05-04
41 Qwen 3.6 27B 53.162% Qwen3.6 27B
qwen-qwen3.6-27b
Imported 2026-05-04
42 Gemini 3 Pro Preview 53.055% Gemini 3
google-gemini-3
Imported 2026-05-04
43 Gemma 4 31B It 52.626% Gemma 4 31B
google-gemma-4-31b-it
Imported 2026-05-04
44 GPT 5 Nano 2025-08-07 52.626% GPT-5 Nano
openai-gpt-5-nano
Imported 2026-05-04
45 GLM 5 Thinking 52.519% GLM GLM 5
z-ai-glm-5
Imported 2026-05-04
46 GPT 5.4 Nano 2026-03-17 51.875% GPT-5.4 Nano
openai-gpt-5.4-nano
Imported 2026-05-04
47 GPT 5.4 Mini 2026-03-17 51.661% GPT-5.4 Mini
openai-gpt-5.4-mini
Imported 2026-05-04
48 GLM 5.1 Thinking 51.554% GLM GLM 5.1
z-ai-glm-5.1
Imported 2026-05-04
49 Qwen 3.6 Plus 51.447% Qwen3.6 Plus
qwen-qwen3.6-plus
Imported 2026-05-04
50 GPT Oss 120B 48.767% gpt-oss-120b
openai-gpt-oss-120b
Imported 2026-05-04
51 Qwen 3.6 Max Preview 47.91% Qwen3.6 Max Preview
qwen-qwen3.6-max-preview
Imported 2026-05-04
52 Qwen 3 Max 47.481% Qwen3 Max
qwen-qwen3-max
Imported 2026-05-04
53 Mistral Medium 3.5 44.159% Mistral: Mistral Medium 3.5
mistralai-mistral-medium-3-5
Imported 2026-05-04
54 GPT Oss 20B 43.837% gpt-oss-20b
openai-gpt-oss-20b
Imported 2026-05-04