gpt-oss-120b

GPT / OpenAI

104scores
90benchmarks
$0 / $0 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-oss-120b, gpt-oss-120b:free, openai-gpt-oss-120b, openai/gpt-oss-120b, openai/gpt-oss-120b:free

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 33 14.50 2026-05-06
APEX-Agents-AA Agentic 16 3.1% 2026-05-11
AutoBench Agentic 24 2.76 2026-05-06
CAR-bench Agentic 11 0.28 2026-05-06
EnterpriseOps-Gym Agentic 14 23% 2026-05-05
Gert Labs Rankings Agentic 54 0.34 2026-05-11
MCPMark Agentic 36 0.05 2026-05-06
MultiChallenge Agentic 24 45.34 2026-05-06
PinchBench Agentic 59 0.67 2026-05-06
Poker Agent Agentic 13 1015.331% 2025-12-23
Tau2-Bench Telecom Agentic 148 65.8% 2026-05-11
Tau2-Bench Telecom Agentic 193 45% 2026-05-11
Terminal-Bench Hard Agentic 130 23.5% 2026-05-11
Terminal-Bench Hard Agentic 262 5.3% 2026-05-11
Vending-Bench 2 Agentic 39 -21.53 2026-05-28
WildAgtEval Agentic 4 62.5% 2026-05-28
OpenUGI Alignment 1083 19.65 2026-05-06
ALE-Bench Coding 64 575.63 2026-05-06
ArtifactsBench Coding 4 57.69 2026-05-06
Codeforces Coding 6 0.821 2026-05-28
LiveCodeBench Coding 32 83.234% 2026-05-28
SciCode Coding 122 38.9% 2026-05-11
SciCode Coding 175 36% 2026-05-11
SWE-bench Verified Coding 49 33.6% 2026-05-28
Terminal-Bench 2.0 Coding 56 19.101% 2026-05-28
TuRTLe Code Completion (Icarus Verilog) Coding 5 77.82 2026-05-06
TuRTLe Code Completion (Verilator) Coding 7 74.91 2026-05-06
TuRTLe Module Completion (NotSoTiny) Coding 4 20.90 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 7 70.52 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 7 70.18 2026-05-06
MMTU Data 10 0.54 2026-05-06
GSMA Open Telco Leaderboard Domain 36 58.27 2026-05-06
IslamicLegalBench Domain 11 32.72 2026-05-06
AA-Omniscience Factuality 26 -50.05 2026-05-11
Vectara HHEM Hallucination Leaderboard Factuality 85 85.80 2026-05-06
CorpFin v2 Finance 62 58.236% 2026-05-28
Finance Agent v1.1 Finance 45 21.541% 2026-05-04
PRBench Finance Finance 10 43.80 2026-05-06
TaxEval v2 Finance 51 71.586% 2026-05-28
React Native Evals Frontend Development 14 71.6289% overall 2026-05-28
InfiniteBM Chess Game 2 1660.89 Elo / 6 games 2026-05-28
InfiniteBM Coup Game 7 1375.93 Elo / 19 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 24 1046.1 Elo / 132 games 2026-05-28
InfiniteBM Liar's Dice Game 25 1135.48 Elo / 138 games 2026-05-28
InfiniteBM Settlers of Catan Game 1 1958.76 Elo / 5 games 2026-05-28
InfiniteBM Werewolf Game 4 1202.92 Elo / 7 games 2026-05-28
MageBench Season 1 Game 31 1516 rating / 9 games 2026-05-28
ALL Bench LLM General Knowledge 15 35.74 2026-05-06
BenchLM General Knowledge 84 35 2026-05-06
HELM AIR-Bench Generalization 5 0.880049 2026-05-28
WeirdML Generalization 7 48.17 2026-05-06
HealthBench Hard Healthcare 1 0.6 2026-05-27
MedQA Healthcare 39 91.36% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 120 33.27 2026-05-11
Artificial Analysis Intelligence Index Intelligence 198 24.47 2026-05-11
GPQA Diamond Intelligence 45 78.536% 2026-05-28
Humanity's Last Exam Intelligence 78 18.5% 2026-05-11
Humanity's Last Exam Intelligence 279 5.2% 2026-05-11
MMLU Pro Intelligence 70 79.166% 2026-05-28
MMLU-Pro Intelligence 100 80.8% 2026-05-11
MMLU-Pro Intelligence 149 77.5% 2026-05-11
AraGen v3 Language 31 43.23 2026-05-06
HellaSwag Language 16 70.50 2026-05-06
PIQA Language 16 76.70 2026-05-06
WinoGrande Language 21 66.10 2026-05-06
CaseLaw v2 Legal 50 48.767% 2026-05-04
LegalBench Legal 78 75.938% 2026-05-28
LEXam Legal 15 51.74% open / 47.71% MCQ 2026-05-28
Professional Reasoning Bench - Legal Legal 13 40.21 2026-05-06
AIME Math 18 92.598% 2026-04-16
AIME 2025 Math 15 93.4% 2026-05-11
AIME 2025 Math 103 66.7% 2026-05-11
IneqMath Math 10 23.50 2026-05-06
LiveMathematicianBench Math 6 28.8% 2026-05-28
MATH 500 Math 6 94.8% 2026-01-09
MGSM Math 24 92.036% 2026-01-09
OTIS Mock AIME 2024-2025 Mathematics 3 88.89 2026-05-06
BRIDGE Medical Leaderboard Medical 120 39.04 2026-05-27
BRIDGE Medical Leaderboard Medical 146 37.24 2026-05-27
BRIDGE Medical Leaderboard Medical 210 32.11 2026-05-27
LiveMedBench Medical 6 0.2503 2026-05-27
MEDIC Benchmark Medical 53 61.39 average normalized public table score 2026-05-27
Medmarks Medical 4 0.5507240209717496 2026-05-27
Medmarks Medical 11 0.5864776402646992 2026-05-27
Medmarks Medical 12 0.5771191625196621 2026-05-27
Medmarks Medical 19 0.552403723762488 2026-05-27
MedSafe-Dx Medical 8 85.2 2026-05-27
ALL Bench Multimodal Multimodal 18 30.67 2026-05-06
Design Arena Multimodal 112 1021 2026-05-06
Artificial Analysis Openness Index Openness 118 38.89 2026-05-11
FINAL Bench Metacognitive Reasoning 7 73.33 2026-05-06
GPQA Diamond Reasoning 121 78.2% 2026-05-11
GPQA Diamond Reasoning 229 67.2% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 18 15.48 2026-05-06
MultiNRC Reasoning 33 15.17 2026-05-06
SimpleBench Reasoning 23 22.10 2026-05-06
InvisibleBench Safety 8 0.05 2026-05-06
LiveSecBench Safety 11 66.63 2026-05-27
ChemBench Science 9 0.63 2026-05-06
CritPt Science 83 1.1% 2026-05-11
CritPt Science 229 0% 2026-05-11
SWE-bench Pro Software Engineering 8 16.20 2026-05-06
K-MetBench Weather 10 77.3% accuracy 2026-05-28
Lech Mazur Writing Writing 16 7.73 2026-05-06