IslamicLegalBench
Benchmark for evaluating large language models on Islamic law and jurisprudence, with public aggregate scores over 718 private instances across 13 tasks.
Metadata
Metrics
Correct %, Partially Correct %, Incorrect % (lower is better), Abstention % (lower is better), Hallucination % (lower is better), FPQ Challenge %, FPQ Sycophancy % (lower is better), FPQ Refusal % (lower is better), Total Items, LOW Correct %, MODERATE Correct %, HIGH Correct %, T1 Correct %, T1 Hallucination % (lower is better), T2 Correct %, T2 Hallucination % (lower is better), T3 Correct %, T3 Hallucination % (lower is better), T4 Correct %, T4 Hallucination % (lower is better), T5 Correct %, T5 Hallucination % (lower is better), T6 Correct %, T6 Hallucination % (lower is better), T7 Correct %, T7 Hallucination % (lower is better), T8 Correct %, T8 Hallucination % (lower is better), T9 Correct %, T9 Hallucination % (lower is better), T10 Correct %, T10 Hallucination % (lower is better), T11 Correct %, T11 Hallucination % (lower is better), T12 Correct %, T12 Hallucination % (lower is better), T13 Correct %, T13 Hallucination % (lower is better)
| Rank | Subject | Correct % | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5 | 67.65 | GPT-5 openai-gpt-5 | Imported | 2026-05-06 |
| 2 | Ansari (latest) | 66.85 | — | Imported | 2026-05-06 |
| 3 | Claude Sonnet 4.5 | 65.63 | Claude Sonnet 4.5 anthropic-claude-sonnet-4.5 | Imported | 2026-05-06 |
| 4 | Ansari Latest (Gemini 3.1 Pro, Thinking - Low) | 64.54 | — | Imported | 2026-05-06 |
| 5 | Gemini 2.5 Pro | 62.79 | Gemini 2.5 Pro google-gemini-2.5-pro | Imported | 2026-05-06 |
| 6 | Grok 4 | 61.69 | Grok 4 x-ai-grok-4 | Imported | 2026-05-06 |
| 7 | Ansari (legacy) | 59.95 | — | Imported | 2026-05-06 |
| 8 | DeepSeek-R1 | 54.21 | R1 deepseek-r1 | Imported | 2026-05-06 |
| 9 | Qwen3 235B A22B Instruct 2507 | 48.87 | Qwen3 235B A22B Instruct 2507 qwen-qwen3-235b-a22b-2507 | Imported | 2026-05-06 |
| 10 | Llama 4 Maverick 17B 128E Instruct FP8 | 46.78 | — | Imported | 2026-05-06 |
| 11 | GPT-OSS 120B | 32.72 | gpt-oss-120b openai-gpt-oss-120b | Imported | 2026-05-06 |
| 12 | Meta Llama 3.1 8B Instruct Turbo | 30.96 | — | Imported | 2026-05-06 |
No matching rows.