gpt-oss-120b | BenchmarkList

Metadata

GPT Closed/API

Aliases: gpt-oss-120b, gpt-oss-120b:free, openai-gpt-oss-120b, openai/gpt-oss-120b, openai/gpt-oss-120b:free

Benchmark	Category	Rank	Score	Sampled
APEX-Agents	Agentic	33	14.50	2026-05-06
APEX-Agents-AA	Agentic	16	3.1%	2026-05-11
AutoBench	Agentic	24	2.76	2026-05-06
CAR-bench	Agentic	11	0.28	2026-05-06
EnterpriseOps-Gym	Agentic	14	23%	2026-05-05
Gert Labs Rankings	Agentic	54	0.34	2026-05-11
MCPMark	Agentic	36	0.05	2026-05-06
MultiChallenge	Agentic	24	45.34	2026-05-06
PinchBench	Agentic	59	0.67	2026-05-06
Poker Agent	Agentic	13	1015.331%	2025-12-23
Tau2-Bench Telecom	Agentic	148	65.8%	2026-05-11
Tau2-Bench Telecom	Agentic	193	45%	2026-05-11
Terminal-Bench Hard	Agentic	130	23.5%	2026-05-11
Terminal-Bench Hard	Agentic	262	5.3%	2026-05-11
Vending-Bench 2	Agentic	39	-21.53	2026-05-28
WildAgtEval	Agentic	4	62.5%	2026-05-28
OpenUGI	Alignment	1083	19.65	2026-05-06
ALE-Bench	Coding	64	575.63	2026-05-06
ArtifactsBench	Coding	4	57.69	2026-05-06
Codeforces	Coding	6	0.821	2026-05-28
LiveCodeBench	Coding	32	83.234%	2026-05-28
SciCode	Coding	122	38.9%	2026-05-11
SciCode	Coding	175	36%	2026-05-11
SWE-bench Verified	Coding	49	33.6%	2026-05-28
Terminal-Bench 2.0	Coding	56	19.101%	2026-05-28
TuRTLe Code Completion (Icarus Verilog)	Coding	5	77.82	2026-05-06
TuRTLe Code Completion (Verilator)	Coding	7	74.91	2026-05-06
TuRTLe Module Completion (NotSoTiny)	Coding	4	20.90	2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog)	Coding	7	70.52	2026-05-06
TuRTLe Spec-to-RTL (Verilator)	Coding	7	70.18	2026-05-06
MMTU	Data	10	0.54	2026-05-06
GSMA Open Telco Leaderboard	Domain	36	58.27	2026-05-06
IslamicLegalBench	Domain	11	32.72	2026-05-06
AA-Omniscience	Factuality	26	-50.05	2026-05-11
Vectara HHEM Hallucination Leaderboard	Factuality	85	85.80	2026-05-06
CorpFin v2	Finance	62	58.236%	2026-05-28
Finance Agent v1.1	Finance	45	21.541%	2026-05-04
PRBench Finance	Finance	10	43.80	2026-05-06
TaxEval v2	Finance	51	71.586%	2026-05-28
React Native Evals	Frontend Development	14	71.6289% overall	2026-05-28
InfiniteBM Chess	Game	2	1660.89 Elo / 6 games	2026-05-28
InfiniteBM Coup	Game	7	1375.93 Elo / 19 games	2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em	Game	24	1046.1 Elo / 132 games	2026-05-28
InfiniteBM Liar's Dice	Game	25	1135.48 Elo / 138 games	2026-05-28
InfiniteBM Settlers of Catan	Game	1	1958.76 Elo / 5 games	2026-05-28
InfiniteBM Werewolf	Game	4	1202.92 Elo / 7 games	2026-05-28
MageBench Season 1	Game	31	1516 rating / 9 games	2026-05-28
ALL Bench LLM	General Knowledge	15	35.74	2026-05-06
BenchLM	General Knowledge	84	35	2026-05-06
HELM AIR-Bench	Generalization	5	0.880049	2026-05-28
WeirdML	Generalization	7	48.17	2026-05-06
HealthBench Hard	Healthcare	1	0.6	2026-05-27
MedQA	Healthcare	39	91.36%	2026-04-16
Artificial Analysis Intelligence Index	Intelligence	120	33.27	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	198	24.47	2026-05-11
GPQA Diamond	Intelligence	45	78.536%	2026-05-28
Humanity's Last Exam	Intelligence	78	18.5%	2026-05-11
Humanity's Last Exam	Intelligence	279	5.2%	2026-05-11
MMLU Pro	Intelligence	70	79.166%	2026-05-28
MMLU-Pro	Intelligence	100	80.8%	2026-05-11
MMLU-Pro	Intelligence	149	77.5%	2026-05-11
AraGen v3	Language	31	43.23	2026-05-06
HellaSwag	Language	16	70.50	2026-05-06
PIQA	Language	16	76.70	2026-05-06
WinoGrande	Language	21	66.10	2026-05-06
CaseLaw v2	Legal	50	48.767%	2026-05-04
LegalBench	Legal	78	75.938%	2026-05-28
LEXam	Legal	15	51.74% open / 47.71% MCQ	2026-05-28
Professional Reasoning Bench - Legal	Legal	13	40.21	2026-05-06
AIME	Math	18	92.598%	2026-04-16
AIME 2025	Math	15	93.4%	2026-05-11
AIME 2025	Math	103	66.7%	2026-05-11
IneqMath	Math	10	23.50	2026-05-06
LiveMathematicianBench	Math	6	28.8%	2026-05-28
MATH 500	Math	6	94.8%	2026-01-09
MGSM	Math	24	92.036%	2026-01-09
OTIS Mock AIME 2024-2025	Mathematics	3	88.89	2026-05-06
BRIDGE Medical Leaderboard	Medical	120	39.04	2026-05-27
BRIDGE Medical Leaderboard	Medical	146	37.24	2026-05-27
BRIDGE Medical Leaderboard	Medical	210	32.11	2026-05-27
LiveMedBench	Medical	6	0.2503	2026-05-27
MEDIC Benchmark	Medical	53	61.39 average normalized public table score	2026-05-27
Medmarks	Medical	4	0.5507240209717496	2026-05-27
Medmarks	Medical	11	0.5864776402646992	2026-05-27
Medmarks	Medical	12	0.5771191625196621	2026-05-27
Medmarks	Medical	19	0.552403723762488	2026-05-27
MedSafe-Dx	Medical	8	85.2	2026-05-27
ALL Bench Multimodal	Multimodal	18	30.67	2026-05-06
Design Arena	Multimodal	112	1021	2026-05-06
Artificial Analysis Openness Index	Openness	118	38.89	2026-05-11
FINAL Bench Metacognitive	Reasoning	7	73.33	2026-05-06
GPQA Diamond	Reasoning	121	78.2%	2026-05-11
GPQA Diamond	Reasoning	229	67.2%	2026-05-11
Humanity's Last Exam (Text Only)	Reasoning	18	15.48	2026-05-06
MultiNRC	Reasoning	33	15.17	2026-05-06
SimpleBench	Reasoning	23	22.10	2026-05-06
InvisibleBench	Safety	8	0.05	2026-05-06
LiveSecBench	Safety	11	66.63	2026-05-27
ChemBench	Science	9	0.63	2026-05-06
CritPt	Science	83	1.1%	2026-05-11
CritPt	Science	229	0%	2026-05-11
SWE-bench Pro	Software Engineering	8	16.20	2026-05-06
K-MetBench	Weather	10	77.3% accuracy	2026-05-28
Lech Mazur Writing	Writing	16	7.73	2026-05-06

Metadata

Benchmark Results