gpt-oss-20b

GPT / OpenAI

67scores
51benchmarks
$0 / $0 per 1M tokenscost in/out

Metadata

GPT Closed/API

Aliases: gpt-oss-20b, gpt-oss-20b:free, openai-gpt-oss-20b, openai/gpt-oss-20b, openai/gpt-oss-20b:free

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents-AA Agentic 18 0.7% 2026-05-11
AutoBench Agentic 28 2.65 2026-05-06
PinchBench Agentic 60 0.66 2026-05-06
Tau2-Bench Telecom Agentic 160 60.2% 2026-05-11
Tau2-Bench Telecom Agentic 176 50.3% 2026-05-11
Terminal-Bench Hard Agentic 204 10.6% 2026-05-11
Terminal-Bench Hard Agentic 269 4.5% 2026-05-11
OpenUGI Alignment 1174 12.40 2026-05-06
OpenUGI Alignment 1194 8.96 2026-05-06
OpenUGI Alignment 1199 7.85 2026-05-06
ALE-Bench Coding 65 566.05 2026-05-06
Codeforces Coding 10 0.7433 2026-05-28
LiveCodeBench Coding 43 80.387% 2026-05-28
SciCode Coding 203 34.4% 2026-05-11
SciCode Coding 207 34% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 12 66.48 2026-05-06
TuRTLe Code Completion (Verilator) Coding 12 65.92 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 11 63.70 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 13 63.20 2026-05-06
MMTU Data 16 0.48 2026-05-06
AA-Omniscience Factuality 28 -63.92 2026-05-11
CorpFin v2 Finance 75 53.147% 2026-05-28
Fin-RATE Finance 6 18.69% 2026-05-28
TaxEval v2 Finance 92 63.696% 2026-05-28
React Native Evals Frontend Development 17 71.0222% overall 2026-05-28
ALL Bench LLM General Knowledge 25 26.25 2026-05-06
BenchLM General Knowledge 109 18 2026-05-06
HELM AIR-Bench Generalization 10 0.859677 2026-05-28
HealthBench Hard Healthcare 12 0.48 2026-05-27
MedQA Healthcare 65 82.875% 2026-04-16
Artificial Analysis Intelligence Index Intelligence 199 24.47 2026-05-11
Artificial Analysis Intelligence Index Intelligence 234 20.79 2026-05-11
GPQA Diamond Intelligence 71 68.94% 2026-05-28
Humanity's Last Exam Intelligence 157 9.8% 2026-05-11
Humanity's Last Exam Intelligence 288 5.1% 2026-05-11
MMLU Pro Intelligence 89 71.636% 2026-05-28
MMLU-Pro Intelligence 182 74.8% 2026-05-11
MMLU-Pro Intelligence 208 71.8% 2026-05-11
AraGen v3 Language 41 30.61 2026-05-06
CaseLaw v2 Legal 54 43.837% 2026-05-04
LegalBench Legal 85 70.849% 2026-05-28
LEXam Legal 29 32.12% open / 40.78% MCQ 2026-05-28
AIME Math 34 86.042% 2026-04-16
AIME 2025 Math 30 89.3% 2026-05-11
AIME 2025 Math 113 62.3% 2026-05-11
MATH 500 Math 10 94.2% 2026-01-09
MGSM Math 51 89.018% 2026-01-09
BRIDGE Medical Leaderboard Medical 236 29.05 2026-05-27
BRIDGE Medical Leaderboard Medical 262 25.14 2026-05-27
BRIDGE Medical Leaderboard Medical 264 24.86 2026-05-27
MEDIC Benchmark Medical 73 55.49 average normalized public table score 2026-05-27
Medmarks Medical 10 0.4266482358575701 2026-05-27
Medmarks Medical 27 0.5361352530208202 2026-05-27
Medmarks Medical 32 0.5197952213409743 2026-05-27
Medmarks Medical 44 0.4820454322540748 2026-05-27
LatamBoard Multilingual 39 38.26 2026-05-06
ALL Bench Multimodal Multimodal 29 23.61 2026-05-06
Artificial Analysis Openness Index Openness 119 38.89 2026-05-11
GPQA Diamond Reasoning 214 68.8% 2026-05-11
GPQA Diamond Reasoning 272 61.1% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 29 9.73 2026-05-06
MultiNRC Reasoning 39 10.43 2026-05-06
ChemBench Science 15 0.61 2026-05-06
CritPt Science 74 1.4% 2026-05-11
CritPt Science 230 0% 2026-05-11
Structured Output Benchmark Structured Output 28 73.20 2026-05-06
K-MetBench Weather 20 71.5% accuracy 2026-05-28