GPT-3.5 Turbo
GPT / OpenAI
96scores
64benchmarks
$0.5 / $1.5 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-3.5-turbo, openai-gpt-3.5-turbo, openai/gpt-3.5-turbo
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| OmniACT | Agentic | 6 | 7.89 | 2026-05-27 |
| ToolSandbox | Agentic | 3 | 65.6 | 2026-05-27 |
| RewardBench | Alignment | 83 | 75.78 | 2026-05-06 |
| RewardBench | Alignment | 142 | 65.34 | 2026-05-06 |
| TextClass Benchmark | Classification | 47 | 1560.51 | 2026-05-06 |
| BigCodeBench | Coding | 58 | 39.10 | 2026-05-06 |
| CodeEditorBench | Coding | 8 | 0.803 | 2026-05-27 |
| CodeEditorBench | Coding | 12 | 0.776 | 2026-05-27 |
| CodeEditorBench | Coding | 23 | 0.724 | 2026-05-27 |
| CodeEditorBench | Coding | 27 | 0.7 | 2026-05-27 |
| CodeEditorBench | Coding | 30 | 0.684 | 2026-05-27 |
| CodeEditorBench | Coding | 48 | 0.5 | 2026-05-27 |
| CRUXEval | Coding | 11 | 54.65 | 2026-05-05 |
| DS-1000 | Coding | 3 | 0.394 | 2026-05-27 |
| DS-1000 | Coding | 5 | 0.386 | 2026-05-27 |
| EvalPlus | Coding | 21 | 70.20 | 2026-05-05 |
| MBPP+ | Coding | 15 | 69.70 | 2026-05-05 |
| McEval | Coding | 4 | 52.6% | 2026-05-27 |
| GSMA Open Telco Leaderboard | Domain | 60 | 43.85 | 2026-05-06 |
| FinBen | Finance | 3 | 4.48% | 2026-05-27 |
| Open FinLLM Leaderboard | Finance | 12 | 23.164665% | 2026-05-27 |
| MixEval Chat | General Knowledge | 30 | 43 | 2026-05-06 |
| AgentHarm | Generalization | 26 | 62.0% | 2026-05-27 |
| AgentHarm | Generalization | 27 | 62.2% | 2026-05-27 |
| AgentHarm | Generalization | 29 | 63.2% | 2026-05-27 |
| AlpacaEval | Generalization | 29 | 81.73910844041163 | 2026-05-27 |
| AlpacaEval | Generalization | 37 | 79.17893267677465 | 2026-05-27 |
| AlpacaEval | Generalization | 46 | 75.55853548412969 | 2026-05-27 |
| AlpacaEval | Generalization | 61 | 66.88517803643602 | 2026-05-27 |
| CyberBench | Generalization | 2 | 62.6% | 2026-05-28 |
| CyberSecEval | Generalization | 6 | 39.13% | 2026-05-27 |
| EQ-Bench | Generalization | 39 | 71.74 | 2026-05-06 |
| EQ-Bench | Generalization | 43 | 70.67 | 2026-05-06 |
| EQ-Bench | Generalization | 46 | 69.64 | 2026-05-06 |
| EQ-Bench | Generalization | 47 | 69.51 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 50 | 0.635494 | 2026-05-28 |
| HELM AIR-Bench | Generalization | 51 | 0.631279 | 2026-05-28 |
| HELM AIR-Bench | Generalization | 68 | 0.525378 | 2026-05-28 |
| HELM AIR-Bench | Generalization | 78 | 0.439673 | 2026-05-28 |
| HELM Safety | Generalization | 52 | 0.852869 | 2026-05-28 |
| HELM Safety | Generalization | 56 | 0.834979 | 2026-05-28 |
| HELM Safety | Generalization | 58 | 0.813594 | 2026-05-28 |
| MT-Bench | Generalization | 2 | 7.94375 | 2026-05-27 |
| NarrativeQA | Generalization | 31 | 66.304398% | 2026-05-27 |
| NarrativeQA | Generalization | 39 | 62.507836% | 2026-05-27 |
| WeirdML | Generalization | 30 | 3.48 | 2026-05-06 |
| WildBench | Generalization | 50 | 6.613880742913001 | 2026-05-27 |
| HealthBench | Healthcare | 5 | 0.1554 | 2026-05-27 |
| MedQA | Healthcare | 87 | 58.471% | 2026-04-16 |
| RubricEval | Instruction Following | 11 | 2.52 | 2026-05-06 |
| URIAL Bench | Instruction Following | 2 | 7.94 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 448 | 8.99 | 2026-05-11 |
| BoolQ | Intelligence | 7 | 87% | 2026-05-27 |
| BoolQ | Intelligence | 37 | 74% | 2026-05-27 |
| Gaokao-Bench | Intelligence | 5 | 53.2% | 2026-05-27 |
| GPQA Diamond | Intelligence | 108 | 30.556% | 2026-05-28 |
| HELM | Intelligence | 10 | 78.296037% | 2026-05-27 |
| HELM | Intelligence | 13 | 76.025641% | 2026-05-27 |
| HELM Instruct | Intelligence | 1 | 0.688889 | 2026-05-28 |
| HELM Lite | Intelligence | 48 | 0.400283 | 2026-05-28 |
| HELM MMLU | Intelligence | 2 | 58.98 | 2026-05-06 |
| HELM MMLU | Intelligence | 28 | 39.09 | 2026-05-06 |
| MMLU-Pro | Intelligence | 306 | 46.2% | 2026-05-11 |
| Natural Questions | Intelligence | 11 | 67.477806% | 2026-05-27 |
| Natural Questions | Intelligence | 29 | 62.433188% | 2026-05-27 |
| TableBench | Intelligence | 25 | 37.15% | 2026-05-27 |
| ANLI | Language | 2 | 58.10 | 2026-05-06 |
| HindiGen v1 | Language | 26 | 49.10 | 2026-05-06 |
| WinoGrande | Language | 8 | 81.60 | 2026-05-06 |
| HELM LegalBench | Legal | 3 | 62.781186% | 2026-05-27 |
| HELM LegalBench | Legal | 63 | 46.830266% | 2026-05-27 |
| LawBench | Legal | 4 | 44.5226 | 2026-05-27 |
| LawBench | Legal | 5 | 42.1477 | 2026-05-27 |
| LegalBench | Legal | 98 | 64.372% | 2026-05-28 |
| HELM GSM8K | Math | 3 | 53.1% | 2026-05-27 |
| HELM GSM8K | Math | 6 | 46.9% | 2026-05-27 |
| MATH | Math | 1 | 48.83286% | 2026-05-27 |
| MATH | Math | 2 | 45.27817% | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 60 | 43.61 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 172 | 35.3 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 214 | 31.63 | 2026-05-27 |
| Open Medical-LLM Leaderboard | Medical | 76 | 67.69 | 2026-05-06 |
| BenchBench | Meta | 91 | 0.36 | 2026-05-06 |
| BBH | Reasoning | 9 | 61.59 | 2026-05-06 |
| ConvRe | Reasoning | 3 | 60.60 | 2026-05-06 |
| DROP | Reasoning | 20 | 0.70 | 2026-05-06 |
| GPQA Diamond | Reasoning | 449 | 29.7% | 2026-05-11 |
| NPHardEval | Reasoning | 3 | 0.26 | 2026-05-06 |
| ZebraLogic | Reasoning | 56 | 10.10 | 2026-05-06 |
| AI-Secure LLM Trustworthy Leaderboard | Safety | 7 | 0.72 | 2026-05-06 |
| ChemBench | Science | 34 | 0.47 | 2026-05-06 |
| SciKnowEval | Science | 12 | 12 | 2026-05-27 |
| DevBench | Software Engineering | 6 | 24.0286 | 2026-05-27 |
| SheetCopilot Benchmark | Spreadsheets | 1 | 87.3% | 2026-05-27 |
| SheetCopilot Benchmark | Spreadsheets | 2 | 85.0% | 2026-05-27 |
| VNTL Leaderboard | Translation | 12 | 69.98 | 2026-05-06 |
No matching rows.