J1-ENVS
Interactive legal-agent benchmark from J1Bench where agents complete Chinese legal consultation, drafting, civil court, and criminal court scenarios under procedural rules.
17rows
overall_scoreprimary metric
2026-05-26sampled
Metadata
Metrics
Overall Score, Level I Score, Level II Score, Level III Score, Knowledge Questioning, Legal Consultation, Complaint Drafting, Defense Drafting, Civil Court, Criminal Court
| Rank | Subject | Overall Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-4o (2024-11-20) | 63.90 | GPT-4o (2024-11-20) openai-gpt-4o-2024-11-20 | Imported | 2026-05-26 |
| 2 | Claude | 60.41 | — | Imported | 2026-05-26 |
| 3 | Qwen3 32B | 59.14 | Qwen3 32B qwen-qwen3-32b | Imported | 2026-05-26 |
| 4 | Gemma 3 27B | 55.73 | Gemma 3 27B google-gemma-3-27b-it | Imported | 2026-05-26 |
| 5 | Llama 3.3 70B Instruct | 54.78 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-26 |
| 6 | DeepSeek-V3 | 53.86 | DeepSeek V3 0324 deepseek-deepseek-chat-v3-0324 | Imported | 2026-05-26 |
| 7 | GLM-4 9B | 51.81 | — | Imported | 2026-05-26 |
| 8 | Qwen2.5 7B Instruct | 51.35 | Qwen2.5 7B Instruct qwen-qwen-2.5-7b-instruct | Imported | 2026-05-26 |
| 9 | Qwen3 14B | 50.81 | Qwen3 14B qwen-qwen3-14b | Imported | 2026-05-26 |
| 10 | Gemma 3 12B | 50.42 | Gemma 3 12B google-gemma-3-12b-it | Imported | 2026-05-26 |
| 11 | Qwen3 4B | 49.33 | — | Imported | 2026-05-26 |
| 12 | InternLM3 | 48 | — | Imported | 2026-05-26 |
| 13 | DeepSeek-R1 | 43.48 | R1 deepseek-r1 | Imported | 2026-05-26 |
| 14 | Qwen3 8B | 42.48 | Qwen3 8B qwen-qwen3-8b | Imported | 2026-05-26 |
| 15 | Mistral 7B | 26.71 | Mistral: Mistral 7B Instruct v0.1 mistralai-mistral-7b-instruct-v0.1 | Imported | 2026-05-26 |
| 16 | ChatLaw2 | 23.22 | — | Imported | 2026-05-26 |
| 17 | LawLLM | 18.85 | — | Imported | 2026-05-26 |
No matching rows.