ConvRe
ConvRe evaluates language models' understanding of converse relations through Re2Text and Text2Re tasks in easy and hard settings.
10rows
avgprimary metric
2026-05-06sampled
Metadata
Metrics
Avg, Re2Text-Easy, Text2Re-Easy, Re2Text-Hard, Text2Re-Hard
| Rank | Subject | Avg | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | claude-1.3 | 66.50 | — | Imported | 2026-05-06 |
| 2 | text-davinci-003 | 65 | — | Imported | 2026-05-06 |
| 3 | gpt-3.5-turbo-0301 | 60.60 | GPT-3.5 Turbo openai-gpt-3.5-turbo | Imported | 2026-05-06 |
| 4 | claude-instant-1.1 | 57.90 | — | Imported | 2026-05-06 |
| 5 | gpt-4-0314 | 56.50 | GPT-4 (older v0314) openai-gpt-4-0314 | Imported | 2026-05-06 |
| 6 | flan-t5-xl | 52 | — | Imported | 2026-05-06 |
| 7 | flan-t5-large | 51.20 | — | Imported | 2026-05-06 |
| 8 | flan-t5-base | 50.80 | — | Imported | 2026-05-06 |
| 9 | flan-t5-xxl | 50.40 | — | Imported | 2026-05-06 |
| 10 | flan-t5-small | 49.50 | — | Imported | 2026-05-06 |
No matching rows.