J1-ENVS

Interactive legal-agent benchmark from J1Bench where agents complete Chinese legal consultation, drafting, civil court, and criminal court scenarios under procedural rules.

17rows
overall_scoreprimary metric
2026-05-26sampled

Metadata

Metrics

Overall Score, Level I Score, Level II Score, Level III Score, Knowledge Questioning, Legal Consultation, Complaint Drafting, Defense Drafting, Civil Court, Criminal Court

Latest Results

Scores are percentages computed from source environment averages. Overall and level metrics are weighted by published J1-Eval environment counts.

Rank Subject Overall Score Model Match Provenance Sampled
1 GPT-4o (2024-11-20) 63.90 GPT-4o (2024-11-20)
openai-gpt-4o-2024-11-20
Imported 2026-05-26
2 Claude 60.41 Imported 2026-05-26
3 Qwen3 32B 59.14 Qwen3 32B
qwen-qwen3-32b
Imported 2026-05-26
4 Gemma 3 27B 55.73 Gemma 3 27B
google-gemma-3-27b-it
Imported 2026-05-26
5 Llama 3.3 70B Instruct 54.78 Llama 3.3 70B Instruct
meta-llama-llama-3.3-70b-instruct
Imported 2026-05-26
6 DeepSeek-V3 53.86 DeepSeek V3 0324
deepseek-deepseek-chat-v3-0324
Imported 2026-05-26
7 GLM-4 9B 51.81 Imported 2026-05-26
8 Qwen2.5 7B Instruct 51.35 Qwen2.5 7B Instruct
qwen-qwen-2.5-7b-instruct
Imported 2026-05-26
9 Qwen3 14B 50.81 Qwen3 14B
qwen-qwen3-14b
Imported 2026-05-26
10 Gemma 3 12B 50.42 Gemma 3 12B
google-gemma-3-12b-it
Imported 2026-05-26
11 Qwen3 4B 49.33 Imported 2026-05-26
12 InternLM3 48 Imported 2026-05-26
13 DeepSeek-R1 43.48 R1
deepseek-r1
Imported 2026-05-26
14 Qwen3 8B 42.48 Qwen3 8B
qwen-qwen3-8b
Imported 2026-05-26
15 Mistral 7B 26.71 Mistral: Mistral 7B Instruct v0.1
mistralai-mistral-7b-instruct-v0.1
Imported 2026-05-26
16 ChatLaw2 23.22 Imported 2026-05-26
17 LawLLM 18.85 Imported 2026-05-26