HELM Lite
HELM Lite: Evaluates broad language-model knowledge, reasoning, commonsense, instruction following, or exam-style accuracy.
79rows
mean_win_rateprimary metric
2026-05-28sampled
Metadata
Metrics
Mean win rate, NarrativeQA - F1, NaturalQuestions (open-book) - F1, NaturalQuestions (closed-book) - F1, OpenbookQA - EM, MMLU - EM, MATH - Equivalent (CoT), GSM8K - EM, LegalBench - EM, MedQA - EM, WMT 2014 - BLEU-4
| Rank | Subject | Mean win rate | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o (2024-05-13) | 0.959457 | GPT-4o openai-gpt-4o | Imported | 2026-05-28 |
| 2 | Claude 3.5 Sonnet (20240620) | 0.912171 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-28 |
| 3 | GPT-4 (0613) | 0.908908 | GPT-4 openai-gpt-4 | Imported | 2026-05-28 |
| 4 | GPT-4 Turbo (2024-04-09) | 0.898402 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-28 |
| 5 | Llama 3.1 Instruct Turbo (405B) | 0.889094 | — | Imported | 2026-05-28 |
| 6 | Llama 3.2 Vision Instruct Turbo (90B) | 0.859565 | — | Imported | 2026-05-28 |
| 7 | Palmyra-X-004 | 0.850483 | — | Imported | 2026-05-28 |
| 8 | Llama 3.1 Instruct Turbo (70B) | 0.848044 | — | Imported | 2026-05-28 |
| 9 | Llama 3 (70B) | 0.826523 | — | Imported | 2026-05-28 |
| 10 | Qwen2 Instruct (72B) | 0.815285 | — | Imported | 2026-05-28 |
| 11 | Mistral Large 2 (2407) | 0.787704 | — | Imported | 2026-05-28 |
| 12 | Gemini 1.5 Pro (001) | 0.781802 | — | Imported | 2026-05-28 |
| 13 | Qwen2.5 Instruct Turbo (72B) | 0.780586 | — | Imported | 2026-05-28 |
| 14 | GPT-4o mini (2024-07-18) | 0.756818 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-28 |
| 15 | Mixtral (8x22B) | 0.752148 | — | Imported | 2026-05-28 |
| 16 | GPT-4 Turbo (1106 preview) | 0.745371 | GPT-4 Turbo openai-gpt-4-turbo | Imported | 2026-05-28 |
| 17 | Gemma 2 Instruct (27B) | 0.726540 | — | Imported | 2026-05-28 |
| 18 | Palmyra X V3 (72B) | 0.723443 | — | Imported | 2026-05-28 |
| 19 | Gemini 1.5 Flash (001) | 0.716184 | — | Imported | 2026-05-28 |
| 20 | Claude 3 Opus (20240229) | 0.712737 | — | Imported | 2026-05-28 |
| 21 | Jamba 1.5 Large | 0.685681 | — | Imported | 2026-05-28 |
| 22 | PaLM-2 (Unicorn) | 0.681136 | — | Imported | 2026-05-28 |
| 23 | Qwen1.5 (72B) | 0.658541 | — | Imported | 2026-05-28 |
| 24 | Palmyra X V2 (33B) | 0.639785 | — | Imported | 2026-05-28 |
| 25 | Yi (34B) | 0.617150 | — | Imported | 2026-05-28 |
| 26 | Gemma 2 Instruct (9B) | 0.616742 | — | Imported | 2026-05-28 |
| 27 | Qwen1.5 Chat (110B) | 0.598352 | — | Imported | 2026-05-28 |
| 28 | Qwen1.5 (32B) | 0.595621 | — | Imported | 2026-05-28 |
| 29 | Claude v1.3 | 0.571936 | — | Imported | 2026-05-28 |
| 30 | PaLM-2 (Bison) | 0.567741 | — | Imported | 2026-05-28 |
| 31 | Mixtral (8x7B 32K seqlen) | 0.562646 | — | Imported | 2026-05-28 |
| 32 | Phi-3 (14B) | 0.558775 | — | Imported | 2026-05-28 |
| 33 | Qwen2.5 Instruct Turbo (7B) | 0.538803 | — | Imported | 2026-05-28 |
| 34 | Claude 2.0 | 0.537562 | — | Imported | 2026-05-28 |
| 35 | DeepSeek LLM Chat (67B) | 0.536713 | — | Imported | 2026-05-28 |
| 36 | Phi-3 (7B) | 0.524217 | — | Imported | 2026-05-28 |
| 37 | Llama 2 (70B) | 0.521503 | — | Imported | 2026-05-28 |
| 38 | Yi Large (Preview) | 0.512088 | — | Imported | 2026-05-28 |
| 39 | Command R Plus | 0.489161 | — | Imported | 2026-05-28 |
| 40 | GPT-3.5 (text-davinci-003) | 0.485065 | — | Imported | 2026-05-28 |
| 41 | Claude 2.1 | 0.483750 | — | Imported | 2026-05-28 |
| 42 | Qwen1.5 (14B) | 0.473019 | — | Imported | 2026-05-28 |
| 43 | Gemini 1.0 Pro (002) | 0.466317 | — | Imported | 2026-05-28 |
| 44 | Jamba 1.5 Mini | 0.458258 | — | Imported | 2026-05-28 |
| 45 | Claude Instant 1.2 | 0.445521 | — | Imported | 2026-05-28 |
| 46 | Llama 3 (8B) | 0.427831 | — | Imported | 2026-05-28 |
| 47 | Claude 3 Sonnet (20240229) | 0.408841 | — | Imported | 2026-05-28 |
| 48 | GPT-3.5 Turbo (0613) | 0.400283 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-28 |
| 49 | Arctic Instruct | 0.379579 | — | Imported | 2026-05-28 |
| 50 | LLaMA (65B) | 0.378455 | — | Imported | 2026-05-28 |
| 51 | Mistral NeMo (2402) | 0.377689 | Mistral: Mistral Nemo mistralai-mistral-nemo | Imported | 2026-05-28 |
| 52 | Gemma (7B) | 0.375974 | — | Imported | 2026-05-28 |
| 53 | GPT-3.5 (text-davinci-002) | 0.374592 | — | Imported | 2026-05-28 |
| 54 | Mistral Large (2402) | 0.368548 | Mistral Large mistralai-mistral-large | Imported | 2026-05-28 |
| 55 | Llama 3.2 Vision Instruct Turbo (11B) | 0.358641 | — | Imported | 2026-05-28 |
| 56 | Command | 0.357909 | — | Imported | 2026-05-28 |
| 57 | Llama 3.1 Instruct Turbo (8B) | 0.335448 | — | Imported | 2026-05-28 |
| 58 | Command R | 0.333650 | Command R (08-2024) cohere-command-r-08-2024 | Imported | 2026-05-28 |
| 59 | DBRX Instruct | 0.325533 | — | Imported | 2026-05-28 |
| 60 | Mistral v0.1 (7B) | 0.325200 | — | Imported | 2026-05-28 |
| 61 | Mistral Small (2402) | 0.323102 | — | Imported | 2026-05-28 |
| 62 | Jamba Instruct | 0.321670 | — | Imported | 2026-05-28 |
| 63 | Qwen1.5 (7B) | 0.306810 | — | Imported | 2026-05-28 |
| 64 | Mistral Medium (2312) | 0.303197 | — | Imported | 2026-05-28 |
| 65 | Claude 3 Haiku (20240307) | 0.294206 | Claude 3 Haiku anthropic-claude-3-haiku | Imported | 2026-05-28 |
| 66 | Yi (6B) | 0.280952 | — | Imported | 2026-05-28 |
| 67 | Llama 2 (13B) | 0.259765 | — | Imported | 2026-05-28 |
| 68 | Falcon (40B) | 0.239910 | — | Imported | 2026-05-28 |
| 69 | Jurassic-2 Jumbo (178B) | 0.239877 | — | Imported | 2026-05-28 |
| 70 | Mistral Instruct v0.3 (7B) | 0.220813 | — | Imported | 2026-05-28 |
| 71 | Jurassic-2 Grande (17B) | 0.192424 | — | Imported | 2026-05-28 |
| 72 | Phi-2 | 0.191392 | — | Imported | 2026-05-28 |
| 73 | Llama 2 (7B) | 0.169972 | — | Imported | 2026-05-28 |
| 74 | Luminous Supreme (70B) | 0.163528 | — | Imported | 2026-05-28 |
| 75 | Command Light | 0.118032 | — | Imported | 2026-05-28 |
| 76 | Luminous Extended (30B) | 0.090435 | — | Imported | 2026-05-28 |
| 77 | Falcon (7B) | 0.073110 | — | Imported | 2026-05-28 |
| 78 | OLMo (7B) | 0.060273 | — | Imported | 2026-05-28 |
| 79 | Luminous Base (13B) | 0.047436 | — | Imported | 2026-05-28 |
No matching rows.