ConvRe

ConvRe evaluates language models' understanding of converse relations through Re2Text and Text2Re tasks in easy and hard settings.

10rows
avgprimary metric
2026-05-06sampled

Metadata

Metrics

Avg, Re2Text-Easy, Text2Re-Easy, Re2Text-Hard, Text2Re-Hard

Latest Results

Rank Subject Avg Model Match Provenance Sampled
1 claude-1.3 66.50 Imported 2026-05-06
2 text-davinci-003 65 Imported 2026-05-06
3 gpt-3.5-turbo-0301 60.60 GPT-3.5 Turbo
openai-gpt-3.5-turbo
Imported 2026-05-06
4 claude-instant-1.1 57.90 Imported 2026-05-06
5 gpt-4-0314 56.50 GPT-4 (older v0314)
openai-gpt-4-0314
Imported 2026-05-06
6 flan-t5-xl 52 Imported 2026-05-06
7 flan-t5-large 51.20 Imported 2026-05-06
8 flan-t5-base 50.80 Imported 2026-05-06
9 flan-t5-xxl 50.40 Imported 2026-05-06
10 flan-t5-small 49.50 Imported 2026-05-06