Gemma 4 31B

Gemma / Google

52scores
42benchmarks
$0.13 / $0.38 per 1M tokenscost in/out

Metadata

Gemma Closed/API

Aliases: gemma-4-31b-it, gemma-4-31b-it-20260402, google-gemma-4-31b-it, google-gemma-4-31b-it-20260402, google/gemma-4-31b-it, google/gemma-4-31b-it-20260402

Benchmark Results

Benchmark Category Rank Score Sampled
AutoBench Agentic 21 2.79 2026-05-06
Gert Labs Rankings Agentic 44 0.40 2026-05-11
ITBench-AA Agentic 9 37.3% 2026-05-28
PinchBench Agentic 47 0.76 2026-05-06
t2-bench Agentic 5 0.86 2026-05-06
Tau2-Bench Telecom Agentic 151 65.5% 2026-05-11
Tau2-Bench Telecom Agentic 161 59.9% 2026-05-11
Terminal-Bench Hard Agentic 53 36.4% 2026-05-11
Terminal-Bench Hard Agentic 97 30.3% 2026-05-11
TERMS-Bench Agentic 4 64.0% SE+ 2026-05-28
OpenUGI Alignment 1018 23.05 2026-05-06
OpenUGI Alignment 1044 21.54 2026-05-06
ALE-Bench Coding 33 925.50 2026-05-06
Arena AI Code Coding 39 1387 2026-05-06
SciCode Coding 53 43.4% 2026-05-11
SciCode Coding 74 41.1% 2026-05-11
Terminal-Bench 2.0 Coding 34 39.326% 2026-05-28
TuRTLe Code Completion (Icarus Verilog) Coding 3 82.57 2026-05-06
TuRTLe Code Completion (Verilator) Coding 3 80.51 2026-05-06
TuRTLe Module Completion (NotSoTiny) Coding 2 29.54 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 1 81.51 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 1 79.83 2026-05-06
Arena AI Document Document AI 16 1432 2026-05-06
SAGE Education 2 55.034% 2026-05-28
AA-Omniscience Factuality 24 -45.42 2026-05-11
Vectara HHEM Hallucination Leaderboard Factuality 35 92.60 2026-05-06
Finance Agent v1.1 Finance 27 50.788% 2026-05-04
MortgageTax Finance 38 61.368% 2026-05-28
React Native Evals Frontend Development 11 75.2381% overall 2026-05-28
MedXpertQA Healthcare 5 0.61 2026-05-06
Artificial Analysis Intelligence Index Intelligence 81 39.18 2026-05-11
Artificial Analysis Intelligence Index Intelligence 129 32.29 2026-05-11
Humanity's Last Exam Intelligence 56 22.7% 2026-05-11
Humanity's Last Exam Intelligence 133 11.5% 2026-05-11
LiveBench Intelligence 50 62.38 2026-05-05
MathVision Intelligence 11 85.60 2026-05-06
AraGen v3 Language 10 72.71 2026-05-06
CaseLaw v2 Legal 43 52.626% 2026-05-04
MRCR v2 Long Context 1 0.66 2026-05-06
BRIDGE Medical Leaderboard Medical 2 54.88 2026-05-27
BRIDGE Medical Leaderboard Medical 39 46.74 2026-05-27
BRIDGE Medical Leaderboard Medical 42 45.88 2026-05-27
MMMU-Pro Multimodal 13 76.90 2026-05-06
ParseBench Multimodal 2 62.40 2026-05-06
Artificial Analysis Openness Index Openness 115 38.89 2026-05-11
Altered Riddles Reasoning 15 0.4615 2026-05-27
BIG-Bench Extra Hard Reasoning 1 0.74 2026-05-06
GPQA Diamond Reasoning 47 85.7% 2026-05-11
GPQA Diamond Reasoning 146 76.3% 2026-05-11
CritPt Science 71 1.4% 2026-05-11
CritPt Science 199 0% 2026-05-11
Structured Output Benchmark Structured Output 21 83.30 2026-05-06