GPT-3.5 Turbo

GPT / OpenAI

96scores
64benchmarks
$0.5 / $1.5 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-3.5-turbo, openai-gpt-3.5-turbo, openai/gpt-3.5-turbo

Benchmark Results

Benchmark Category Rank Score Sampled
OmniACT Agentic 6 7.89 2026-05-27
ToolSandbox Agentic 3 65.6 2026-05-27
RewardBench Alignment 83 75.78 2026-05-06
RewardBench Alignment 142 65.34 2026-05-06
TextClass Benchmark Classification 47 1560.51 2026-05-06
BigCodeBench Coding 58 39.10 2026-05-06
CodeEditorBench Coding 8 0.803 2026-05-27
CodeEditorBench Coding 12 0.776 2026-05-27
CodeEditorBench Coding 23 0.724 2026-05-27
CodeEditorBench Coding 27 0.7 2026-05-27
CodeEditorBench Coding 30 0.684 2026-05-27
CodeEditorBench Coding 48 0.5 2026-05-27
CRUXEval Coding 11 54.65 2026-05-05
DS-1000 Coding 3 0.394 2026-05-27
DS-1000 Coding 5 0.386 2026-05-27
EvalPlus Coding 21 70.20 2026-05-05
MBPP+ Coding 15 69.70 2026-05-05
McEval Coding 4 52.6% 2026-05-27
GSMA Open Telco Leaderboard Domain 60 43.85 2026-05-06
FinBen Finance 3 4.48% 2026-05-27
Open FinLLM Leaderboard Finance 12 23.164665% 2026-05-27
MixEval Chat General Knowledge 30 43 2026-05-06
AgentHarm Generalization 26 62.0% 2026-05-27
AgentHarm Generalization 27 62.2% 2026-05-27
AgentHarm Generalization 29 63.2% 2026-05-27
AlpacaEval Generalization 29 81.73910844041163 2026-05-27
AlpacaEval Generalization 37 79.17893267677465 2026-05-27
AlpacaEval Generalization 46 75.55853548412969 2026-05-27
AlpacaEval Generalization 61 66.88517803643602 2026-05-27
CyberBench Generalization 2 62.6% 2026-05-28
CyberSecEval Generalization 6 39.13% 2026-05-27
EQ-Bench Generalization 39 71.74 2026-05-06
EQ-Bench Generalization 43 70.67 2026-05-06
EQ-Bench Generalization 46 69.64 2026-05-06
EQ-Bench Generalization 47 69.51 2026-05-06
HELM AIR-Bench Generalization 50 0.635494 2026-05-28
HELM AIR-Bench Generalization 51 0.631279 2026-05-28
HELM AIR-Bench Generalization 68 0.525378 2026-05-28
HELM AIR-Bench Generalization 78 0.439673 2026-05-28
HELM Safety Generalization 52 0.852869 2026-05-28
HELM Safety Generalization 56 0.834979 2026-05-28
HELM Safety Generalization 58 0.813594 2026-05-28
MT-Bench Generalization 2 7.94375 2026-05-27
NarrativeQA Generalization 31 66.304398% 2026-05-27
NarrativeQA Generalization 39 62.507836% 2026-05-27
WeirdML Generalization 30 3.48 2026-05-06
WildBench Generalization 50 6.613880742913001 2026-05-27
HealthBench Healthcare 5 0.1554 2026-05-27
MedQA Healthcare 87 58.471% 2026-04-16
RubricEval Instruction Following 11 2.52 2026-05-06
URIAL Bench Instruction Following 2 7.94 2026-05-06
Artificial Analysis Intelligence Index Intelligence 448 8.99 2026-05-11
BoolQ Intelligence 7 87% 2026-05-27
BoolQ Intelligence 37 74% 2026-05-27
Gaokao-Bench Intelligence 5 53.2% 2026-05-27
GPQA Diamond Intelligence 108 30.556% 2026-05-28
HELM Intelligence 10 78.296037% 2026-05-27
HELM Intelligence 13 76.025641% 2026-05-27
HELM Instruct Intelligence 1 0.688889 2026-05-28
HELM Lite Intelligence 48 0.400283 2026-05-28
HELM MMLU Intelligence 2 58.98 2026-05-06
HELM MMLU Intelligence 28 39.09 2026-05-06
MMLU-Pro Intelligence 306 46.2% 2026-05-11
Natural Questions Intelligence 11 67.477806% 2026-05-27
Natural Questions Intelligence 29 62.433188% 2026-05-27
TableBench Intelligence 25 37.15% 2026-05-27
ANLI Language 2 58.10 2026-05-06
HindiGen v1 Language 26 49.10 2026-05-06
WinoGrande Language 8 81.60 2026-05-06
HELM LegalBench Legal 3 62.781186% 2026-05-27
HELM LegalBench Legal 63 46.830266% 2026-05-27
LawBench Legal 4 44.5226 2026-05-27
LawBench Legal 5 42.1477 2026-05-27
LegalBench Legal 98 64.372% 2026-05-28
HELM GSM8K Math 3 53.1% 2026-05-27
HELM GSM8K Math 6 46.9% 2026-05-27
MATH Math 1 48.83286% 2026-05-27
MATH Math 2 45.27817% 2026-05-27
BRIDGE Medical Leaderboard Medical 60 43.61 2026-05-27
BRIDGE Medical Leaderboard Medical 172 35.3 2026-05-27
BRIDGE Medical Leaderboard Medical 214 31.63 2026-05-27
Open Medical-LLM Leaderboard Medical 76 67.69 2026-05-06
BenchBench Meta 91 0.36 2026-05-06
BBH Reasoning 9 61.59 2026-05-06
ConvRe Reasoning 3 60.60 2026-05-06
DROP Reasoning 20 0.70 2026-05-06
GPQA Diamond Reasoning 449 29.7% 2026-05-11
NPHardEval Reasoning 3 0.26 2026-05-06
ZebraLogic Reasoning 56 10.10 2026-05-06
AI-Secure LLM Trustworthy Leaderboard Safety 7 0.72 2026-05-06
ChemBench Science 34 0.47 2026-05-06
SciKnowEval Science 12 12 2026-05-27
DevBench Software Engineering 6 24.0286 2026-05-27
SheetCopilot Benchmark Spreadsheets 1 87.3% 2026-05-27
SheetCopilot Benchmark Spreadsheets 2 85.0% 2026-05-27
VNTL Leaderboard Translation 12 69.98 2026-05-06