Expert-SWE (Internal)
OpenAI internal frontier software engineering evaluation for long-horizon coding tasks with a median estimated human completion time of 20 hours.
2rows
scoreprimary metric
2026-04-23sampled
Metadata
Metrics
Score
| Rank | Subject | Score | Model Match | Provenance | Sampled |
|---|---|---|---|---|---|
| 1 | GPT-5.5 | 73.1% | GPT-5.5 openai-gpt-5.5 | Launch post | 2026-04-23 |
| 2 | GPT-5.4 | 68.5% | GPT-5.4 openai-gpt-5.4 | Launch post | 2026-04-23 |
No matching rows.