COMPL-AI

EU AI Act compliance benchmarking suite for LLMs spanning bias, toxicity, harmful-instruction refusal, consistency, robustness, calibration, reasoning, privacy, memorization, and related trustworthiness tasks.

15rows
compl_ai_averageprimary metric
2026-05-06sampled

Metadata

Metrics

COMPL-AI average, Task coverage, Prejudiced Answers: BBQ, Biased Completions: BOLD, Toxic Completions of Benign Text: RealToxicityPrompts, Following Harmful Instructions: AdvBench, Monotonicity Checks, Self-Check Consistency, BoolQ Contrast Set, IMDB Contrast Set, Logit Calibration: BIG-Bench, Self-Assessment: TriviaQA, Income Fairness: DecodingTrust, Common Sense Reasoning: HellaSwag, Coding: HumanEval, Goal Hijacking and Prompt Leakage, Rule Following, Representation Bias: RedditBias, Truthfulness: TruthfulQA MC2, General Knowledge: MMLU, Reasoning: AI2 Reasoning Challenge, Denying Human Presence, Copyrighted Material Memorization, PII Extraction by Association, Recommendation Consistency: FaiRLLM, MMLU: Robustness, Watermark Reliability & Robustness, Bias of the Dataset, Toxicity of the Dataset

Latest Results

Rows are parsed from public COMPL-AI Space result JSON files. The display score is the macro-average over non-null aggregate task scores; task coverage is retained so missing-task rows are transparent.

Rank Subject COMPL-AI average Model Match Provenance Sampled
1 gpt-4-1106-preview 0.86 Imported 2026-05-06
2 Claude3Opus 0.85 Imported 2026-05-06
3 gemini-1.5-flash-001 0.80 Imported 2026-05-06
4 gpt-3.5-turbo-0125 0.77 Imported 2026-05-06
5 01-ai/Yi-34B-Chat 0.72 Imported 2026-05-06
6 Qwen/Qwen1.5-72B-Chat 0.72 Imported 2026-05-06
7 speakleash/Bielik-11B-v2.3-Instruct 0.71 Imported 2026-05-06
8 meta-llama/Llama-2-70b-chat-hf 0.70 Imported 2026-05-06
9 mistralai/Mixtral-8x7B-Instruct-v0.1 0.70 Mistral: Mixtral 8x7B Instruct
mistralai-mixtral-8x7b-instruct
Imported 2026-05-06
10 mistralai/Mistral-7B-Instruct-v0.3 0.68 Imported 2026-05-06
11 mistralai/Mistral-7B-Instruct-v0.2 0.67 Imported 2026-05-06
12 meta-llama/Llama-2-13b-chat-hf 0.66 Imported 2026-05-06
13 mistralai/Mistral-7B-v0.3 0.66 Imported 2026-05-06
14 meta-llama/Llama-2-7b-chat-hf 0.63 Imported 2026-05-06
15 google/gemma-2-9b 0.58 Imported 2026-05-06