LEXam
Legal reasoning benchmark derived from 340 law exams across 116 law-school courses, covering long-form open questions and multiple-choice questions in English and German.
36rows
open_question_judge_scoreprimary metric
2026-05-28sampled
Metadata
Metrics
Open Question Judge Score, Multiple-Choice Accuracy
| Rank | Subject | Open Question Judge Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5 | 70.20% open / 62.65% MCQ | GPT-5 openai-gpt-5 | Imported | 2026-05-28 |
| 2 | Gemini-2.5-Pro | 67.40% open / 55.72% MCQ | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-28 |
| 3 | Claude-3.7-Sonnet | 62.86% open / 57.23% MCQ | Claude 3.7 Sonnet anthropic-claude-3.7-sonnet | Imported | 2026-05-28 |
| 4 | Claude-4.5-Sonnet | 62.76% open / 58.01% MCQ | — | Imported | 2026-05-28 |
| 5 | GPT-5-mini | 60.32% open / 54.82% MCQ | GPT-5 Mini openai-gpt-5-mini | Imported | 2026-05-28 |
| 6 | GPT-4.1 | 57.50% open / 54.40% MCQ | GPT-4.1 openai-gpt-4.1 | Imported | 2026-05-28 |
| 7 | DeepSeek-V3.2-Exp | 57.42% open / 53.07% MCQ | DeepSeek V3.2 Exp deepseek-deepseek-v3.2-exp | Imported | 2026-05-28 |
| 8 | GPT-4o | 56.93% open / 53.13% MCQ | GPT-4o openai-gpt-4o | Imported | 2026-05-28 |
| 9 | DeepSeek-V3.2-reasoner | 56.53% open questions | — | Imported | 2026-05-28 |
| 10 | DeepSeek-V3.2-chat | 55.99% open questions | — | Imported | 2026-05-28 |
| 11 | DeepSeek-R1 | 55.91% open / 52.41% MCQ | R1 deepseek-r1 | Imported | 2026-05-28 |
| 12 | Gemini-3-Pro-preview | 55.38% open questions | Gemini 3 google-gemini-3 | Imported | 2026-05-28 |
| 13 | GPT-4.1-mini | 54.58% open / 48.49% MCQ | GPT-4.1 Mini openai-gpt-4.1-mini | Imported | 2026-05-28 |
| 14 | DeepSeek-V3 | 52.53% open / 46.57% MCQ | DeepSeek V3 deepseek-deepseek-chat | Imported | 2026-05-28 |
| 15 | GPT-OSS-120B | 51.74% open / 47.71% MCQ | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-28 |
| 16 | O3-mini | 48.13% open / 44.22% MCQ | o3-mini openai-o3-mini | Imported | 2026-05-28 |
| 17 | Llama-4-Maverick | 47.25% open / 49.10% MCQ | Llama 4 Maverick meta-llama-4-maverick | Imported | 2026-05-28 |
| 18 | Qwen3-235B | 47.25% open / 48.19% MCQ | Qwen3 235B A22B qwen-qwen3-235b-a22b | Imported | 2026-05-28 |
| 19 | QwQ-32B | 44.36% open / 47.83% MCQ | — | Imported | 2026-05-28 |
| 20 | GPT-4.1-nano | 43.68% open / 39.22% MCQ | GPT-4.1 Nano openai-gpt-4.1-nano | Imported | 2026-05-28 |
| 21 | Qwen3-Next | 43.37% open / 43.31% MCQ | — | Imported | 2026-05-28 |
| 22 | Llama-3.1-405B-it | 43.14% open / 43.19% MCQ | — | Imported | 2026-05-28 |
| 23 | GPT-4o-mini | 42.55% open / 40.96% MCQ | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-28 |
| 24 | Gemma-3-12B-it | 41.29% open / 29.94% MCQ | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-28 |
| 25 | Llama-3.3-70B-it | 41.27% open / 28.19% MCQ | — | Imported | 2026-05-28 |
| 26 | Qwen3-32B | 40.00% open / 45.30% MCQ | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-28 |
| 27 | Phi-4 | 38.54% open / 40.66% MCQ | Phi 4 microsoft-phi-4 | Imported | 2026-05-28 |
| 28 | Apertus-70B | 34.70% open questions | — | Imported | 2026-05-28 |
| 29 | GPT-OSS-20B | 32.12% open / 40.78% MCQ | gpt-oss-20b openai-gpt-oss-20b | Imported | 2026-05-28 |
| 30 | Gemma-2-9B-it | 27.41% open / 25.36% MCQ | — | Imported | 2026-05-28 |
| 31 | GPT-5-nano | 27.25% open / 47.11% MCQ | GPT-5 Nano openai-gpt-5-nano | Imported | 2026-05-28 |
| 32 | EuroLLM-9B-it | 22.95% open / 23.31% MCQ | — | Imported | 2026-05-28 |
| 33 | Apertus-8B | 22.44% open questions | — | Imported | 2026-05-28 |
| 34 | Qwen-2.5-7B-it | 16.67% open / 29.28% MCQ | — | Imported | 2026-05-28 |
| 35 | Ministral-8B-it | 14.88% open / 26.27% MCQ | — | Imported | 2026-05-28 |
| 36 | Llama-3.1-8B-it | 10.00% open / 24.04% MCQ | — | Imported | 2026-05-28 |
No matching rows.