GPT-5.3-Codex
Codex / OpenAI
34scores
32benchmarks
$1.75 / $14 per 1M tokenscost in/out
Metadata
Codex Closed/API
Aliases: gpt-5.3-codex, gpt-5.3-codex-20260224, openai-gpt-5.3-codex, openai-gpt-5.3-codex-20260224, openai/gpt-5.3-codex, openai/gpt-5.3-codex-20260224
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 7 | 46.90 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 22 | 0.52 | 2026-05-11 |
| HiL-Bench | Agentic | 10 | 3.67% | 2026-05-05 |
| OSWorld-Verified | Agentic | 7 | 0.65 | 2026-05-06 |
| RuneBench | Agentic | 7 | 4.30 | 2026-05-05 |
| Tau2-Bench Telecom | Agentic | 81 | 86% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 8 | 53% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 7 | 5940.12 | 2026-05-28 |
| ALE-Bench | Coding | 2 | 1655.22 | 2026-05-06 |
| Arena AI Code | Coding | 28 | 1406 | 2026-05-06 |
| IOI | Coding | 5 | 43.834% | 2026-05-26 |
| LiveCodeBench | Coding | 6 | 87.313% | 2026-05-28 |
| SciCode | Coding | 10 | 53.2% | 2026-05-11 |
| SWE Atlas - Codebase QnA | Coding | 1 | 32.60 | 2026-05-06 |
| SWE Atlas - Refactoring | Coding | 1 | 42.38 | 2026-05-06 |
| SWE Atlas - Test Writing | Coding | 1 | 38.98 | 2026-05-06 |
| SWE-bench Verified | Coding | 8 | 78% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 6 | 64.045% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 5 | 61.767% | 2026-05-28 |
| Cybersecurity CTFs | Cybersecurity | 1 | 0.78 | 2026-05-06 |
| DAXBench | Data | 15 | 88.6% | 2026-05-28 |
| MageBench Season 1 | Game | 4 | 1717 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 14 | 36.24 | 2026-05-06 |
| BenchLM | General Knowledge | 11 | 87 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 9 | 53.56 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 6 | 39.9% | 2026-05-11 |
| LiveBench | Intelligence | 18 | 73.18 | 2026-05-05 |
| LiveBench | Intelligence | 25 | 71.97 | 2026-05-05 |
| ALL Bench Multimodal | Multimodal | 12 | 37.34 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 8 | 16.79 | 2026-05-06 |
| Design Arena | Multimodal | 60 | 1203 | 2026-05-06 |
| GPQA Diamond | Reasoning | 6 | 91.5% | 2026-05-11 |
| CritPt | Science | 9 | 16.9% | 2026-05-11 |
| LiveSQLBench | Text to SQL | 9 | 33.33 | 2026-05-06 |
No matching rows.