GPT-4.5
GPT / OpenAI
30scores
30benchmarks
$75 / $150 per 1M tokenscost in/out
Metadata
GPT Closed/API
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 128 | 10.30 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 124 | 0.80 | 2026-05-05 |
| TextClass Benchmark | Classification | 7 | 1767.86 | 2026-05-06 |
| AIRTBench | Cybersecurity | 2 | 36.89 | 2026-05-06 |
| Spider | Data | 5 | 85.30 | 2026-05-06 |
| Open FinLLM Leaderboard | Finance | 3 | 43.403043% | 2026-05-27 |
| Arena-Hard | Generalization | 12 | 50.0% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 30 | 0.741482 | 2026-05-28 |
| HELM Safety | Generalization | 10 | 0.964672 | 2026-05-28 |
| MMLU Medical Genetics | Healthcare | 1 | 92.0% | 2026-05-27 |
| MMLU Professional Medicine | Healthcare | 1 | 93.75% | 2026-05-27 |
| MultiMedQA | Healthcare | 1 | 82.405833% | 2026-05-27 |
| Multi-IF | Instruction Following | 15 | 0.71 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 243 | 19.96 | 2026-05-11 |
| MathVision | Intelligence | 65 | 47.30 | 2026-05-06 |
| SimpleQA | Intelligence | 1 | 62.5% | 2026-05-27 |
| HindiGen v1 | Language | 30 | 15.46 | 2026-05-06 |
| OpenAI-MRCR: 2 needle 128k | Long Context | 6 | 0.39 | 2026-05-06 |
| CharXiv-D | Multimodal | 3 | 0.90 | 2026-05-06 |
| CharXiv-R | Multimodal | 28 | 0.55 | 2026-05-06 |
| MMSI-Bench | Multimodal | 6 | 40.3% | 2026-05-28 |
| Video SimpleQA | Multimodal | 4 | 54.10 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 34 | 42.11 | 2026-05-06 |
| EnigmaEval | Reasoning | 23 | 3.18 | 2026-05-06 |
| Graphwalks BFS <128k | Reasoning | 6 | 0.72 | 2026-05-06 |
| Graphwalks parents <128k | Reasoning | 4 | 0.73 | 2026-05-06 |
| Humanity's Last Exam (Text Only) | Reasoning | 39 | 5.80 | 2026-05-06 |
| LingOly-TOO | Reasoning | 10 | 0.25 | 2026-05-06 |
| ComplexFuncBench | Tool Use | 3 | 0.63 | 2026-05-06 |
| COLLIE | Writing | 4 | 0.72 | 2026-05-06 |
No matching rows.