gpt-oss-20b | BenchmarkList

Metadata

GPT Closed/API

Aliases: gpt-oss-20b, gpt-oss-20b:free, openai-gpt-oss-20b, openai/gpt-oss-20b, openai/gpt-oss-20b:free

Benchmark	Category	Rank	Score	Sampled
APEX-Agents-AA	Agentic	18	0.7%	2026-05-11
AutoBench	Agentic	28	2.65	2026-05-06
PinchBench	Agentic	60	0.66	2026-05-06
Tau2-Bench Telecom	Agentic	160	60.2%	2026-05-11
Tau2-Bench Telecom	Agentic	176	50.3%	2026-05-11
Terminal-Bench Hard	Agentic	204	10.6%	2026-05-11
Terminal-Bench Hard	Agentic	269	4.5%	2026-05-11
OpenUGI	Alignment	1174	12.40	2026-05-06
OpenUGI	Alignment	1194	8.96	2026-05-06
OpenUGI	Alignment	1199	7.85	2026-05-06
ALE-Bench	Coding	65	566.05	2026-05-06
Codeforces	Coding	10	0.7433	2026-05-28
LiveCodeBench	Coding	43	80.387%	2026-05-28
SciCode	Coding	203	34.4%	2026-05-11
SciCode	Coding	207	34%	2026-05-11
TuRTLe Code Completion (Icarus Verilog)	Coding	12	66.48	2026-05-06
TuRTLe Code Completion (Verilator)	Coding	12	65.92	2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog)	Coding	11	63.70	2026-05-06
TuRTLe Spec-to-RTL (Verilator)	Coding	13	63.20	2026-05-06
MMTU	Data	16	0.48	2026-05-06
AA-Omniscience	Factuality	28	-63.92	2026-05-11
CorpFin v2	Finance	75	53.147%	2026-05-28
Fin-RATE	Finance	6	18.69%	2026-05-28
TaxEval v2	Finance	92	63.696%	2026-05-28
React Native Evals	Frontend Development	17	71.0222% overall	2026-05-28
ALL Bench LLM	General Knowledge	25	26.25	2026-05-06
BenchLM	General Knowledge	109	18	2026-05-06
HELM AIR-Bench	Generalization	10	0.859677	2026-05-28
HealthBench Hard	Healthcare	12	0.48	2026-05-27
MedQA	Healthcare	65	82.875%	2026-04-16
Artificial Analysis Intelligence Index	Intelligence	199	24.47	2026-05-11
Artificial Analysis Intelligence Index	Intelligence	234	20.79	2026-05-11
GPQA Diamond	Intelligence	71	68.94%	2026-05-28
Humanity's Last Exam	Intelligence	157	9.8%	2026-05-11
Humanity's Last Exam	Intelligence	288	5.1%	2026-05-11
MMLU Pro	Intelligence	89	71.636%	2026-05-28
MMLU-Pro	Intelligence	182	74.8%	2026-05-11
MMLU-Pro	Intelligence	208	71.8%	2026-05-11
AraGen v3	Language	41	30.61	2026-05-06
CaseLaw v2	Legal	54	43.837%	2026-05-04
LegalBench	Legal	85	70.849%	2026-05-28
LEXam	Legal	29	32.12% open / 40.78% MCQ	2026-05-28
AIME	Math	34	86.042%	2026-04-16
AIME 2025	Math	30	89.3%	2026-05-11
AIME 2025	Math	113	62.3%	2026-05-11
MATH 500	Math	10	94.2%	2026-01-09
MGSM	Math	51	89.018%	2026-01-09
BRIDGE Medical Leaderboard	Medical	236	29.05	2026-05-27
BRIDGE Medical Leaderboard	Medical	262	25.14	2026-05-27
BRIDGE Medical Leaderboard	Medical	264	24.86	2026-05-27
MEDIC Benchmark	Medical	73	55.49 average normalized public table score	2026-05-27
Medmarks	Medical	10	0.4266482358575701	2026-05-27
Medmarks	Medical	27	0.5361352530208202	2026-05-27
Medmarks	Medical	32	0.5197952213409743	2026-05-27
Medmarks	Medical	44	0.4820454322540748	2026-05-27
LatamBoard	Multilingual	39	38.26	2026-05-06
ALL Bench Multimodal	Multimodal	29	23.61	2026-05-06
Artificial Analysis Openness Index	Openness	119	38.89	2026-05-11
GPQA Diamond	Reasoning	214	68.8%	2026-05-11
GPQA Diamond	Reasoning	272	61.1%	2026-05-11
Humanity's Last Exam (Text Only)	Reasoning	29	9.73	2026-05-06
MultiNRC	Reasoning	39	10.43	2026-05-06
ChemBench	Science	15	0.61	2026-05-06
CritPt	Science	74	1.4%	2026-05-11
CritPt	Science	230	0%	2026-05-11
Structured Output Benchmark	Structured Output	28	73.20	2026-05-06
K-MetBench	Weather	20	71.5% accuracy	2026-05-28

Metadata

Benchmark Results