AgentLeak
Full-stack privacy leakage benchmark for multi-agent LLM systems across output, inter-agent, tool, memory, log, and artifact channels.
5rows
total_leakprimary metric
2026-05-06sampled
Metadata
Metrics
Total Leak (lower is better), C1 Output Leak (lower is better), C2 Internal Leak (lower is better), H1 Audit Gap (lower is better)
| Rank | Subject | Total Leak | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | Claude-3.5-Sonnet | 55.20 | Claude 3.5 Sonnet anthropic-claude-3.5-sonnet | Imported | 2026-05-06 |
| 2 | GPT-4o-mini | 76.30 | GPT-4o-mini openai-gpt-4o-mini | Imported | 2026-05-06 |
| 3 | GPT-4o | 77.60 | GPT-4o openai-gpt-4o | Imported | 2026-05-06 |
| 4 | Llama-3.3-70B | 89.90 | Llama 3.3 70B Instruct meta-llama-llama-3.3-70b-instruct | Imported | 2026-05-06 |
| 5 | Mistral-Large | 99.30 | Mistral Large mistralai-mistral-large | Imported | 2026-05-06 |
No matching rows.