Gemma 4 31B
Gemma / Google
52scores
42benchmarks
$0.13 / $0.38 per 1M tokenscost in/out
Metadata
Gemma Closed/API
Aliases: gemma-4-31b-it, gemma-4-31b-it-20260402, google-gemma-4-31b-it, google-gemma-4-31b-it-20260402, google/gemma-4-31b-it, google/gemma-4-31b-it-20260402
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AutoBench | Agentic | 21 | 2.79 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 44 | 0.40 | 2026-05-11 |
| ITBench-AA | Agentic | 9 | 37.3% | 2026-05-28 |
| PinchBench | Agentic | 47 | 0.76 | 2026-05-06 |
| t2-bench | Agentic | 5 | 0.86 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 151 | 65.5% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 161 | 59.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 53 | 36.4% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 97 | 30.3% | 2026-05-11 |
| TERMS-Bench | Agentic | 4 | 64.0% SE+ | 2026-05-28 |
| OpenUGI | Alignment | 1018 | 23.05 | 2026-05-06 |
| OpenUGI | Alignment | 1044 | 21.54 | 2026-05-06 |
| ALE-Bench | Coding | 33 | 925.50 | 2026-05-06 |
| Arena AI Code | Coding | 39 | 1387 | 2026-05-06 |
| SciCode | Coding | 53 | 43.4% | 2026-05-11 |
| SciCode | Coding | 74 | 41.1% | 2026-05-11 |
| Terminal-Bench 2.0 | Coding | 34 | 39.326% | 2026-05-28 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 3 | 82.57 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 3 | 80.51 | 2026-05-06 |
| TuRTLe Module Completion (NotSoTiny) | Coding | 2 | 29.54 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 1 | 81.51 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 1 | 79.83 | 2026-05-06 |
| Arena AI Document | Document AI | 16 | 1432 | 2026-05-06 |
| SAGE | Education | 2 | 55.034% | 2026-05-28 |
| AA-Omniscience | Factuality | 24 | -45.42 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 35 | 92.60 | 2026-05-06 |
| Finance Agent v1.1 | Finance | 27 | 50.788% | 2026-05-04 |
| MortgageTax | Finance | 38 | 61.368% | 2026-05-28 |
| React Native Evals | Frontend Development | 11 | 75.2381% overall | 2026-05-28 |
| MedXpertQA | Healthcare | 5 | 0.61 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 81 | 39.18 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 129 | 32.29 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 56 | 22.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 133 | 11.5% | 2026-05-11 |
| LiveBench | Intelligence | 50 | 62.38 | 2026-05-05 |
| MathVision | Intelligence | 11 | 85.60 | 2026-05-06 |
| AraGen v3 | Language | 10 | 72.71 | 2026-05-06 |
| CaseLaw v2 | Legal | 43 | 52.626% | 2026-05-04 |
| MRCR v2 | Long Context | 1 | 0.66 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 2 | 54.88 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 39 | 46.74 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 42 | 45.88 | 2026-05-27 |
| MMMU-Pro | Multimodal | 13 | 76.90 | 2026-05-06 |
| ParseBench | Multimodal | 2 | 62.40 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 115 | 38.89 | 2026-05-11 |
| Altered Riddles | Reasoning | 15 | 0.4615 | 2026-05-27 |
| BIG-Bench Extra Hard | Reasoning | 1 | 0.74 | 2026-05-06 |
| GPQA Diamond | Reasoning | 47 | 85.7% | 2026-05-11 |
| GPQA Diamond | Reasoning | 146 | 76.3% | 2026-05-11 |
| CritPt | Science | 71 | 1.4% | 2026-05-11 |
| CritPt | Science | 199 | 0% | 2026-05-11 |
| Structured Output Benchmark | Structured Output | 21 | 83.30 | 2026-05-06 |
No matching rows.