gpt-oss-120b
GPT / OpenAI
104scores
90benchmarks
$0 / $0 per 1M tokenscost in/out
Metadata
GPT Closed/API
Aliases: gpt-oss-120b, gpt-oss-120b:free, openai-gpt-oss-120b, openai/gpt-oss-120b, openai/gpt-oss-120b:free
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 33 | 14.50 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 16 | 3.1% | 2026-05-11 |
| AutoBench | Agentic | 24 | 2.76 | 2026-05-06 |
| CAR-bench | Agentic | 11 | 0.28 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 14 | 23% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 54 | 0.34 | 2026-05-11 |
| MCPMark | Agentic | 36 | 0.05 | 2026-05-06 |
| MultiChallenge | Agentic | 24 | 45.34 | 2026-05-06 |
| PinchBench | Agentic | 59 | 0.67 | 2026-05-06 |
| Poker Agent | Agentic | 13 | 1015.331% | 2025-12-23 |
| Tau2-Bench Telecom | Agentic | 148 | 65.8% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 193 | 45% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 130 | 23.5% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 262 | 5.3% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 39 | -21.53 | 2026-05-28 |
| WildAgtEval | Agentic | 4 | 62.5% | 2026-05-28 |
| OpenUGI | Alignment | 1083 | 19.65 | 2026-05-06 |
| ALE-Bench | Coding | 64 | 575.63 | 2026-05-06 |
| ArtifactsBench | Coding | 4 | 57.69 | 2026-05-06 |
| Codeforces | Coding | 6 | 0.821 | 2026-05-28 |
| LiveCodeBench | Coding | 32 | 83.234% | 2026-05-28 |
| SciCode | Coding | 122 | 38.9% | 2026-05-11 |
| SciCode | Coding | 175 | 36% | 2026-05-11 |
| SWE-bench Verified | Coding | 49 | 33.6% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 56 | 19.101% | 2026-05-28 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 5 | 77.82 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 7 | 74.91 | 2026-05-06 |
| TuRTLe Module Completion (NotSoTiny) | Coding | 4 | 20.90 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 7 | 70.52 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 7 | 70.18 | 2026-05-06 |
| MMTU | Data | 10 | 0.54 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 36 | 58.27 | 2026-05-06 |
| IslamicLegalBench | Domain | 11 | 32.72 | 2026-05-06 |
| AA-Omniscience | Factuality | 26 | -50.05 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 85 | 85.80 | 2026-05-06 |
| CorpFin v2 | Finance | 62 | 58.236% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 45 | 21.541% | 2026-05-04 |
| PRBench Finance | Finance | 10 | 43.80 | 2026-05-06 |
| TaxEval v2 | Finance | 51 | 71.586% | 2026-05-28 |
| React Native Evals | Frontend Development | 14 | 71.6289% overall | 2026-05-28 |
| InfiniteBM Chess | Game | 2 | 1660.89 Elo / 6 games | 2026-05-28 |
| InfiniteBM Coup | Game | 7 | 1375.93 Elo / 19 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 24 | 1046.1 Elo / 132 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 25 | 1135.48 Elo / 138 games | 2026-05-28 |
| InfiniteBM Settlers of Catan | Game | 1 | 1958.76 Elo / 5 games | 2026-05-28 |
| InfiniteBM Werewolf | Game | 4 | 1202.92 Elo / 7 games | 2026-05-28 |
| MageBench Season 1 | Game | 31 | 1516 rating / 9 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 15 | 35.74 | 2026-05-06 |
| BenchLM | General Knowledge | 84 | 35 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 5 | 0.880049 | 2026-05-28 |
| WeirdML | Generalization | 7 | 48.17 | 2026-05-06 |
| HealthBench Hard | Healthcare | 1 | 0.6 | 2026-05-27 |
| MedQA | Healthcare | 39 | 91.36% | 2026-04-16 |
| Artificial Analysis Intelligence Index | Intelligence | 120 | 33.27 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 198 | 24.47 | 2026-05-11 |
| GPQA Diamond | Intelligence | 45 | 78.536% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 78 | 18.5% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 279 | 5.2% | 2026-05-11 |
| MMLU Pro | Intelligence | 70 | 79.166% | 2026-05-28 |
| MMLU-Pro | Intelligence | 100 | 80.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 149 | 77.5% | 2026-05-11 |
| AraGen v3 | Language | 31 | 43.23 | 2026-05-06 |
| HellaSwag | Language | 16 | 70.50 | 2026-05-06 |
| PIQA | Language | 16 | 76.70 | 2026-05-06 |
| WinoGrande | Language | 21 | 66.10 | 2026-05-06 |
| CaseLaw v2 | Legal | 50 | 48.767% | 2026-05-04 |
| LegalBench | Legal | 78 | 75.938% | 2026-05-28 |
| LEXam | Legal | 15 | 51.74% open / 47.71% MCQ | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 13 | 40.21 | 2026-05-06 |
| AIME | Math | 18 | 92.598% | 2026-04-16 |
| AIME 2025 | Math | 15 | 93.4% | 2026-05-11 |
| AIME 2025 | Math | 103 | 66.7% | 2026-05-11 |
| IneqMath | Math | 10 | 23.50 | 2026-05-06 |
| LiveMathematicianBench | Math | 6 | 28.8% | 2026-05-28 |
| MATH 500 | Math | 6 | 94.8% | 2026-01-09 |
| MGSM | Math | 24 | 92.036% | 2026-01-09 |
| OTIS Mock AIME 2024-2025 | Mathematics | 3 | 88.89 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 120 | 39.04 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 146 | 37.24 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 210 | 32.11 | 2026-05-27 |
| LiveMedBench | Medical | 6 | 0.2503 | 2026-05-27 |
| MEDIC Benchmark | Medical | 53 | 61.39 average normalized public table score | 2026-05-27 |
| Medmarks | Medical | 4 | 0.5507240209717496 | 2026-05-27 |
| Medmarks | Medical | 11 | 0.5864776402646992 | 2026-05-27 |
| Medmarks | Medical | 12 | 0.5771191625196621 | 2026-05-27 |
| Medmarks | Medical | 19 | 0.552403723762488 | 2026-05-27 |
| MedSafe-Dx | Medical | 8 | 85.2 | 2026-05-27 |
| ALL Bench Multimodal | Multimodal | 18 | 30.67 | 2026-05-06 |
| Design Arena | Multimodal | 112 | 1021 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 118 | 38.89 | 2026-05-11 |
| FINAL Bench Metacognitive | Reasoning | 7 | 73.33 | 2026-05-06 |
| GPQA Diamond | Reasoning | 121 | 78.2% | 2026-05-11 |
| GPQA Diamond | Reasoning | 229 | 67.2% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 18 | 15.48 | 2026-05-06 |
| MultiNRC | Reasoning | 33 | 15.17 | 2026-05-06 |
| SimpleBench | Reasoning | 23 | 22.10 | 2026-05-06 |
| InvisibleBench | Safety | 8 | 0.05 | 2026-05-06 |
| LiveSecBench | Safety | 11 | 66.63 | 2026-05-27 |
| ChemBench | Science | 9 | 0.63 | 2026-05-06 |
| CritPt | Science | 83 | 1.1% | 2026-05-11 |
| CritPt | Science | 229 | 0% | 2026-05-11 |
| SWE-bench Pro | Software Engineering | 8 | 16.20 | 2026-05-06 |
| K-MetBench | Weather | 10 | 77.3% accuracy | 2026-05-28 |
| Lech Mazur Writing | Writing | 16 | 7.73 | 2026-05-06 |
No matching rows.