COLLIE
COLLIE is a grammar-based framework for systematic construction of constrained text generation tasks. It allows specification of rich, compositional constraints across diverse generation levels and modeling challenges including language understanding, logical reasoning, and semantic planning. The COLLIE-v1 dataset contains 2,080 instances across 13 constraint structures.
9rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5 | 0.99 | GPT-5 openai-gpt-5 | Self-reported | 2026-05-06 |
| 2 | o3-mini | 0.99 | o3-mini openai-o3-mini | Self-reported | 2026-05-06 |
| 3 | o3 | 0.98 | o3 openai-o3 | Self-reported | 2026-05-06 |
| 4 | GPT-4.5 | 0.72 | GPT-4.5 openai-gpt-4.5-preview | Self-reported | 2026-05-06 |
| 5 | GPT-4.1 | 0.66 | GPT-4.1 openai-gpt-4.1 | Self-reported | 2026-05-06 |
| 6 | Mistral Small 4 | 0.63 | Mistral: Mistral Small 4 mistralai-mistral-small-2603 | Self-reported | 2026-05-06 |
| 7 | GPT-4o | 0.61 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Self-reported | 2026-05-06 |
| 8 | GPT-4.1 mini | 0.55 | GPT-4.1 Mini openai-gpt-4.1-mini | Self-reported | 2026-05-06 |
| 9 | GPT-4.1 nano | 0.42 | GPT-4.1 Nano openai-gpt-4.1-nano | Self-reported | 2026-05-06 |
No matching rows.