HELM LegalBench

HELM LegalBench: Measures legal reasoning, contract review, statute interpretation, or legal-domain QA.

69rows
exact_matchprimary metric
2026-05-27sampled

Metadata

Metrics

Exact match, Denoised inference time (lower is better), # eval, # train, # prompt tokens (lower is better), # output tokens (lower is better), # trials

Latest Results

Rows are parsed from HELM Classic public GCS release artifacts for the legal_support group. Rank is assigned by exact match.

Rank Subject Exact match Model Match Provenance Sampled
1 Jurassic-2 Jumbo (178B) 63.871847% Imported 2026-05-27
2 LLaMA (30B) 63.803681% Imported 2026-05-27
3 gpt-3.5-turbo-0301 62.781186% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
4 Anthropic-LM v4-s3 (52B) 62.440354% Imported 2026-05-27
5 Palmyra X (43B) 62.304022% Imported 2026-05-27
6 text-davinci-003 62.167689% Imported 2026-05-27
7 text-davinci-002 61.486026% Imported 2026-05-27
8 T0pp (11B) 61.145194% Imported 2026-05-27
9 Cohere Command beta (52.4B) 60.599864% Imported 2026-05-27
10 Falcon (40B) 60.531697% Imported 2026-05-27
11 Falcon-Instruct (40B) 60.531697% Imported 2026-05-27
12 Vicuna v1.3 (7B) 60.531697% Imported 2026-05-27
13 LLaMA (65B) 59.100204% Imported 2026-05-27
14 Llama 2 (13B) 59.100204% Imported 2026-05-27
15 Vicuna v1.3 (13B) 58.895706% Imported 2026-05-27
16 LLaMA (13B) 58.691207% Imported 2026-05-27
17 Llama 2 (70B) 58.486708% Imported 2026-05-27
18 Mistral v0.1 (7B) 58.486708% Imported 2026-05-27
19 TNLG v2 (530B) 58.009543% Imported 2026-05-27
20 Jurassic-2 Grande (17B) 57.464213% Imported 2026-05-27
21 RedPajama-INCITE-Instruct (7B) 56.850716% Imported 2026-05-27
22 Cohere Command beta (6.1B) 56.646217% Imported 2026-05-27
23 MPT (30B) 56.441718% Imported 2026-05-27
24 J1-Grande v2 beta (17B) 56.237219% Imported 2026-05-27
25 Cohere xlarge v20220609 (52.4B) 55.828221% Imported 2026-05-27
26 Jurassic-2 Large (7.5B) 55.828221% Imported 2026-05-27
27 T5 (11B) 55.828221% Imported 2026-05-27
28 BLOOM (176B) 54.260395% Imported 2026-05-27
29 MPT-Instruct (30B) 53.783231% Imported 2026-05-27
30 Llama 2 (7B) 53.169734% Imported 2026-05-27
31 OPT (175B) 53.169734% Imported 2026-05-27
32 Luminous Supreme (70B) 52.965235% Imported 2026-05-27
33 OPT (66B) 52.69257% Imported 2026-05-27
34 Cohere xlarge v20221108 (52.4B) 52.556237% Imported 2026-05-27
35 Cohere small v20220720 (410M) 52.351738% Imported 2026-05-27
36 Pythia (6.9B) 52.147239% Imported 2026-05-27
37 RedPajama-INCITE-Base (7B) 51.738241% Imported 2026-05-27
38 text-babbage-001 51.738241% Imported 2026-05-27
39 Luminous Extended (30B) 51.670075% Imported 2026-05-27
40 GPT-NeoX (20B) 51.465576% Imported 2026-05-27
41 text-ada-001 51.465576% Imported 2026-05-27
42 J1-Large v1 (7.5B) 51.39741% Imported 2026-05-27
43 Luminous Base (13B) 51.329243% Imported 2026-05-27
44 RedPajama-INCITE-Base-v1 (3B) 51.329243% Imported 2026-05-27
45 Falcon (7B) 51.124744% Imported 2026-05-27
46 Cohere medium v20220720 (6.1B) 50.715746% Imported 2026-05-27
47 UL2 (20B) 50.579414% Imported 2026-05-27
48 J1-Grande v1 (17B) 50.443081% Imported 2026-05-27
49 TNLG v2 (6.7B) 50.374915% Imported 2026-05-27
50 davinci (175B) 49.625085% Imported 2026-05-27
51 babbage (1.3B) 49.216087% Imported 2026-05-27
52 InstructPalmyra (30B) 49.216087% Imported 2026-05-27
53 Cohere large v20220720 (13.1B) 49.147921% Imported 2026-05-27
54 Pythia (12B) 49.079755% Imported 2026-05-27
55 curie (6.7B) 49.011588% Imported 2026-05-27
56 Cohere medium v20221108 (6.1B) 48.943422% Imported 2026-05-27
57 LLaMA (7B) 48.466258% Imported 2026-05-27
58 RedPajama-INCITE-Instruct-v1 (3B) 48.466258% Imported 2026-05-27
59 J1-Jumbo v1 (178B) 48.398091% Imported 2026-05-27
60 YaLM (100B) 48.398091% Imported 2026-05-27
61 Alpaca (7B) 48.261759% Imported 2026-05-27
62 GPT-J (6B) 47.852761% Imported 2026-05-27
63 gpt-3.5-turbo-0613 46.830266% GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-27
64 Falcon-Instruct (7B) 45.194274% Imported 2026-05-27
65 GLM (130B) 45.057941% Imported 2026-05-27
66 text-curie-001 44.239945% Imported 2026-05-27
67 ada (350M) 37.150648% Imported 2026-05-27
68 code-cushman-001 (12B) 0.0% Imported 2026-05-27
69 code-davinci-002 0.0% Imported 2026-05-27