HELM Lite

HELM Lite: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.

79rows
mean_win_rateprimary metric
2026-05-28sampled

Metadata

Metrics

Mean win rate, NarrativeQA - F1, NaturalQuestions (open-book) - F1, NaturalQuestions (closed-book) - F1, OpenbookQA - EM, MMLU - EM, MATH - Equivalent (CoT), GSM8K - EM, LegalBench - EM, MedQA - EM, WMT 2014 - BLEU-4

Latest Results

Rows are imported from the HELM Lite public GCS core_scenarios group JSON. Mean win rate is reported as a percentage.

Rank Subject Mean win rate Model Match Provenance Sampled
1 GPT-4o (2024-05-13) 0.959457 GPT-4o
openai-gpt-4o
Imported 2026-05-28
2 Claude 3.5 Sonnet (20240620) 0.912171 Claude 3.5 Sonnet
anthropic-claude-3.5-sonnet
Imported 2026-05-28
3 GPT-4 (0613) 0.908908 GPT-4
openai-gpt-4
Imported 2026-05-28
4 GPT-4 Turbo (2024-04-09) 0.898402 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-28
5 Llama 3.1 Instruct Turbo (405B) 0.889094 Imported 2026-05-28
6 Llama 3.2 Vision Instruct Turbo (90B) 0.859565 Imported 2026-05-28
7 Palmyra-X-004 0.850483 Imported 2026-05-28
8 Llama 3.1 Instruct Turbo (70B) 0.848044 Imported 2026-05-28
9 Llama 3 (70B) 0.826523 Imported 2026-05-28
10 Qwen2 Instruct (72B) 0.815285 Imported 2026-05-28
11 Mistral Large 2 (2407) 0.787704 Imported 2026-05-28
12 Gemini 1.5 Pro (001) 0.781802 Imported 2026-05-28
13 Qwen2.5 Instruct Turbo (72B) 0.780586 Imported 2026-05-28
14 GPT-4o mini (2024-07-18) 0.756818 GPT-4o-mini
openai-gpt-4o-mini
Imported 2026-05-28
15 Mixtral (8x22B) 0.752148 Imported 2026-05-28
16 GPT-4 Turbo (1106 preview) 0.745371 GPT-4 Turbo
openai-gpt-4-turbo
Imported 2026-05-28
17 Gemma 2 Instruct (27B) 0.726540 Imported 2026-05-28
18 Palmyra X V3 (72B) 0.723443 Imported 2026-05-28
19 Gemini 1.5 Flash (001) 0.716184 Imported 2026-05-28
20 Claude 3 Opus (20240229) 0.712737 Imported 2026-05-28
21 Jamba 1.5 Large 0.685681 Imported 2026-05-28
22 PaLM-2 (Unicorn) 0.681136 Imported 2026-05-28
23 Qwen1.5 (72B) 0.658541 Imported 2026-05-28
24 Palmyra X V2 (33B) 0.639785 Imported 2026-05-28
25 Yi (34B) 0.617150 Imported 2026-05-28
26 Gemma 2 Instruct (9B) 0.616742 Imported 2026-05-28
27 Qwen1.5 Chat (110B) 0.598352 Imported 2026-05-28
28 Qwen1.5 (32B) 0.595621 Imported 2026-05-28
29 Claude v1.3 0.571936 Imported 2026-05-28
30 PaLM-2 (Bison) 0.567741 Imported 2026-05-28
31 Mixtral (8x7B 32K seqlen) 0.562646 Imported 2026-05-28
32 Phi-3 (14B) 0.558775 Imported 2026-05-28
33 Qwen2.5 Instruct Turbo (7B) 0.538803 Imported 2026-05-28
34 Claude 2.0 0.537562 Imported 2026-05-28
35 DeepSeek LLM Chat (67B) 0.536713 Imported 2026-05-28
36 Phi-3 (7B) 0.524217 Imported 2026-05-28
37 Llama 2 (70B) 0.521503 Imported 2026-05-28
38 Yi Large (Preview) 0.512088 Imported 2026-05-28
39 Command R Plus 0.489161 Imported 2026-05-28
40 GPT-3.5 (text-davinci-003) 0.485065 Imported 2026-05-28
41 Claude 2.1 0.483750 Imported 2026-05-28
42 Qwen1.5 (14B) 0.473019 Imported 2026-05-28
43 Gemini 1.0 Pro (002) 0.466317 Imported 2026-05-28
44 Jamba 1.5 Mini 0.458258 Imported 2026-05-28
45 Claude Instant 1.2 0.445521 Imported 2026-05-28
46 Llama 3 (8B) 0.427831 Imported 2026-05-28
47 Claude 3 Sonnet (20240229) 0.408841 Imported 2026-05-28
48 GPT-3.5 Turbo (0613) 0.400283 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-28
49 Arctic Instruct 0.379579 Imported 2026-05-28
50 LLaMA (65B) 0.378455 Imported 2026-05-28
51 Mistral NeMo (2402) 0.377689 Mistral: Mistral Nemo
mistralai-mistral-nemo
Imported 2026-05-28
52 Gemma (7B) 0.375974 Imported 2026-05-28
53 GPT-3.5 (text-davinci-002) 0.374592 Imported 2026-05-28
54 Mistral Large (2402) 0.368548 Mistral Large
mistralai-mistral-large
Imported 2026-05-28
55 Llama 3.2 Vision Instruct Turbo (11B) 0.358641 Imported 2026-05-28
56 Command 0.357909 Imported 2026-05-28
57 Llama 3.1 Instruct Turbo (8B) 0.335448 Imported 2026-05-28
58 Command R 0.333650 C Command R (08-2024)
cohere-command-r-08-2024
Imported 2026-05-28
59 DBRX Instruct 0.325533 Imported 2026-05-28
60 Mistral v0.1 (7B) 0.325200 Imported 2026-05-28
61 Mistral Small (2402) 0.323102 Imported 2026-05-28
62 Jamba Instruct 0.321670 Imported 2026-05-28
63 Qwen1.5 (7B) 0.306810 Imported 2026-05-28
64 Mistral Medium (2312) 0.303197 Imported 2026-05-28
65 Claude 3 Haiku (20240307) 0.294206 Claude 3 Haiku
anthropic-claude-3-haiku
Imported 2026-05-28
66 Yi (6B) 0.280952 Imported 2026-05-28
67 Llama 2 (13B) 0.259765 Imported 2026-05-28
68 Falcon (40B) 0.239910 Imported 2026-05-28
69 Jurassic-2 Jumbo (178B) 0.239877 Imported 2026-05-28
70 Mistral Instruct v0.3 (7B) 0.220813 Imported 2026-05-28
71 Jurassic-2 Grande (17B) 0.192424 Imported 2026-05-28
72 Phi-2 0.191392 Imported 2026-05-28
73 Llama 2 (7B) 0.169972 Imported 2026-05-28
74 Luminous Supreme (70B) 0.163528 Imported 2026-05-28
75 Command Light 0.118032 Imported 2026-05-28
76 Luminous Extended (30B) 0.090435 Imported 2026-05-28
77 Falcon (7B) 0.073110 Imported 2026-05-28
78 OLMo (7B) 0.060273 Imported 2026-05-28
79 Luminous Base (13B) 0.047436 Imported 2026-05-28