COMPL-AI
EU AI Act compliance benchmarking suite for LLMs spanning bias, toxicity, harmful-instruction refusal, consistency, robustness, calibration, reasoning, privacy, memorization, and related trustworthiness tasks.
Metadata
Metrics
COMPL-AI average, Task coverage, Prejudiced Answers: BBQ, Biased Completions: BOLD, Toxic Completions of Benign Text: RealToxicityPrompts, Following Harmful Instructions: AdvBench, Monotonicity Checks, Self-Check Consistency, BoolQ Contrast Set, IMDB Contrast Set, Logit Calibration: BIG-Bench, Self-Assessment: TriviaQA, Income Fairness: DecodingTrust, Common Sense Reasoning: HellaSwag, Coding: HumanEval, Goal Hijacking and Prompt Leakage, Rule Following, Representation Bias: RedditBias, Truthfulness: TruthfulQA MC2, General Knowledge: MMLU, Reasoning: AI2 Reasoning Challenge, Denying Human Presence, Copyrighted Material Memorization, PII Extraction by Association, Recommendation Consistency: FaiRLLM, MMLU: Robustness, Watermark Reliability & Robustness, Bias of the Dataset, Toxicity of the Dataset
| Rank | Subject | COMPL-AI average | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | gpt-4-1106-preview | 0.86 | — | Imported | 2026-05-06 |
| 2 | Claude3Opus | 0.85 | — | Imported | 2026-05-06 |
| 3 | gemini-1.5-flash-001 | 0.80 | — | Imported | 2026-05-06 |
| 4 | gpt-3.5-turbo-0125 | 0.77 | — | Imported | 2026-05-06 |
| 5 | 01-ai/Yi-34B-Chat | 0.72 | — | Imported | 2026-05-06 |
| 6 | Qwen/Qwen1.5-72B-Chat | 0.72 | — | Imported | 2026-05-06 |
| 7 | speakleash/Bielik-11B-v2.3-Instruct | 0.71 | — | Imported | 2026-05-06 |
| 8 | meta-llama/Llama-2-70b-chat-hf | 0.70 | — | Imported | 2026-05-06 |
| 9 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 0.70 | Mistral: Mixtral 8x7B Instruct mistralai-mixtral-8x7b-instruct | Imported | 2026-05-06 |
| 10 | mistralai/Mistral-7B-Instruct-v0.3 | 0.68 | — | Imported | 2026-05-06 |
| 11 | mistralai/Mistral-7B-Instruct-v0.2 | 0.67 | — | Imported | 2026-05-06 |
| 12 | meta-llama/Llama-2-13b-chat-hf | 0.66 | — | Imported | 2026-05-06 |
| 13 | mistralai/Mistral-7B-v0.3 | 0.66 | — | Imported | 2026-05-06 |
| 14 | meta-llama/Llama-2-7b-chat-hf | 0.63 | — | Imported | 2026-05-06 |
| 15 | google/gemma-2-9b | 0.58 | — | Imported | 2026-05-06 |
No matching rows.