COLLIE

COLLIE is a grammar-based framework for systematic construction of constrained text generation tasks. It allows specification of rich, compositional constraints across diverse generation levels and modeling challenges including language understanding, logical reasoning, and semantic planning. The COLLIE-v1 dataset contains 2,080 instances across 13 constraint structures.

9rows
scoreprimary metric
2026-05-06sampled

Metadata

Metrics

Score, Normalized Score

Latest Results

Rank Subject Score Model Match Provenance Sampled
1 GPT-5 0.99 GPT-5
openai-gpt-5
Self-reported 2026-05-06
2 o3-mini 0.99 o3-mini
openai-o3-mini
Self-reported 2026-05-06
3 o3 0.98 o3
openai-o3
Self-reported 2026-05-06
4 GPT-4.5 0.72 GPT-4.5
openai-gpt-4.5-preview
Self-reported 2026-05-06
5 GPT-4.1 0.66 GPT-4.1
openai-gpt-4.1
Self-reported 2026-05-06
6 Mistral Small 4 0.63 Mistral: Mistral Small 4
mistralai-mistral-small-2603
Self-reported 2026-05-06
7 GPT-4o 0.61 GPT-4o (2024-08-06)
openai-gpt-4o-2024-08-06
Self-reported 2026-05-06
8 GPT-4.1 mini 0.55 GPT-4.1 Mini
openai-gpt-4.1-mini
Self-reported 2026-05-06
9 GPT-4.1 nano 0.42 GPT-4.1 Nano
openai-gpt-4.1-nano
Self-reported 2026-05-06