Graphwalks parents <128k
A graph reasoning benchmark that evaluates language models' ability to find parent nodes in graphs with context length under 128k tokens, requiring understanding of graph structure and edge relationships.
11rows
scoreprimary metric
2026-05-06sampled
Metadata
Metrics
Score, Normalized Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.4 | 0.90 | GPT-5.4 openai-gpt-5.4 | Self-reported | 2026-05-06 |
| 2 | GPT-5.2 | 0.89 | GPT-5.2 openai-gpt-5.2 | Self-reported | 2026-05-06 |
| 3 | GPT-5 | 0.73 | GPT-5 openai-gpt-5 | Self-reported | 2026-05-06 |
| 4 | GPT-4.5 | 0.73 | GPT-4.5 openai-gpt-4.5-preview | Self-reported | 2026-05-06 |
| 5 | GPT-5.4 mini | 0.71 | GPT-5.4 Mini openai-gpt-5.4-mini | Self-reported | 2026-05-06 |
| 6 | GPT-4.1 mini | 0.60 | GPT-4.1 Mini openai-gpt-4.1-mini | Self-reported | 2026-05-06 |
| 7 | o3-mini | 0.58 | o3-mini openai-o3-mini | Self-reported | 2026-05-06 |
| 8 | GPT-4.1 | 0.58 | GPT-4.1 openai-gpt-4.1 | Self-reported | 2026-05-06 |
| 9 | GPT-5.4 nano | 0.51 | GPT-5.4 Nano openai-gpt-5.4-nano | Self-reported | 2026-05-06 |
| 10 | GPT-4o | 0.35 | GPT-4o (2024-08-06) openai-gpt-4o-2024-08-06 | Self-reported | 2026-05-06 |
| 11 | GPT-4.1 nano | 0.09 | GPT-4.1 Nano openai-gpt-4.1-nano | Self-reported | 2026-05-06 |
No matching rows.