gpt-oss-20b
GPT / OpenAI
67scores
51benchmarks
$0 / $0 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-oss-20b, gpt-oss-20b:free, openai-gpt-oss-20b, openai/gpt-oss-20b, openai/gpt-oss-20b:free
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents-AA | Agentic | 18 | 0.7% | 2026-05-11 |
| AutoBench | Agentic | 28 | 2.65 | 2026-05-06 |
| PinchBench | Agentic | 60 | 0.66 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 160 | 60.2% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 176 | 50.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 204 | 10.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 269 | 4.5% | 2026-05-11 |
| OpenUGI | Alignment | 1174 | 12.40 | 2026-05-06 |
| OpenUGI | Alignment | 1194 | 8.96 | 2026-05-06 |
| OpenUGI | Alignment | 1199 | 7.85 | 2026-05-06 |
| ALE-Bench | Coding | 65 | 566.05 | 2026-05-06 |
| Codeforces | Coding | 10 | 0.7433 | 2026-05-28 |
| LiveCodeBench | Coding | 43 | 80.387% | 2026-05-28 |
| SciCode | Coding | 203 | 34.4% | 2026-05-11 |
| SciCode | Coding | 207 | 34% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 12 | 66.48 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 12 | 65.92 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 11 | 63.70 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 13 | 63.20 | 2026-05-06 |
| MMTU | Data | 16 | 0.48 | 2026-05-06 |
| AA-Omniscience | Factuality | 28 | -63.92 | 2026-05-11 |
| CorpFin v2 | Finance | 75 | 53.147% | 2026-05-28 |
| Fin-RATE | Finance | 6 | 18.69% | 2026-05-28 |
| TaxEval v2 | Finance | 92 | 63.696% | 2026-05-28 |
| React Native Evals | Frontend Development | 17 | 71.0222% overall | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 25 | 26.25 | 2026-05-06 |
| BenchLM | General Knowledge | 109 | 18 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 10 | 0.859677 | 2026-05-28 |
| HealthBench Hard | Healthcare | 12 | 0.48 | 2026-05-27 |
| MedQA | Healthcare | 65 | 82.875% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 199 | 24.47 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 234 | 20.79 | 2026-05-11 |
| GPQA Diamond | Intelligence | 71 | 68.94% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 157 | 9.8% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 288 | 5.1% | 2026-05-11 |
| MMLU Pro | Intelligence | 89 | 71.636% | 2026-05-28 |
| MMLU-Pro | Intelligence | 182 | 74.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 208 | 71.8% | 2026-05-11 |
| AraGen v3 | Language | 41 | 30.61 | 2026-05-06 |
| CaseLaw v2 | Legal | 54 | 43.837% | 2026-05-04 |
| LegalBench | Legal | 85 | 70.849% | 2026-05-28 |
| LEXam | Legal | 29 | 32.12% open / 40.78% MCQ | 2026-05-28 |
| AIME | Math | 34 | 86.042% | 2026-04-16 |
| AIME 2025 | Math | 30 | 89.3% | 2026-05-11 |
| AIME 2025 | Math | 113 | 62.3% | 2026-05-11 |
| MATH 500 | Math | 10 | 94.2% | 2026-01-09 |
| MGSM | Math | 51 | 89.018% | 2026-01-09 |
| BRIDGE Medical Leaderboard | Medical | 236 | 29.05 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 262 | 25.14 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 264 | 24.86 | 2026-05-27 |
| MEDIC Benchmark | Medical | 73 | 55.49 average normalized public table score | 2026-05-27 |
| Medmarks | Medical | 10 | 0.4266482358575701 | 2026-05-27 |
| Medmarks | Medical | 27 | 0.5361352530208202 | 2026-05-27 |
| Medmarks | Medical | 32 | 0.5197952213409743 | 2026-05-27 |
| Medmarks | Medical | 44 | 0.4820454322540748 | 2026-05-27 |
| LatamBoard | Multilingual | 39 | 38.26 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 29 | 23.61 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 119 | 38.89 | 2026-05-11 |
| GPQA Diamond | Reasoning | 214 | 68.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 272 | 61.1% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 29 | 9.73 | 2026-05-06 |
| MultiNRC | Reasoning | 39 | 10.43 | 2026-05-06 |
| ChemBench | Science | 15 | 0.61 | 2026-05-06 |
| CritPt | Science | 74 | 1.4% | 2026-05-11 |
| CritPt | Science | 230 | 0% | 2026-05-11 |
| Structured Output Benchmark | Structured Output | 28 | 73.20 | 2026-05-06 |
| K-MetBench | Weather | 20 | 71.5% accuracy | 2026-05-28 |
No matching rows.