Gemini 3.1 Pro Preview

Gemini / Google

158scores
122benchmarks
$2 / $12 per 1M tokenscost in/out

Metadata

Gemini Closed/API

Aliases: gemini-3.1-pro-preview, gemini-3.1-pro-preview-20260219, google-gemini-3.1-pro-preview, google-gemini-3.1-pro-preview-20260219, google/gemini-3.1-pro-preview, google/gemini-3.1-pro-preview-20260219

Benchmark Results

Benchmark Category Rank Score Sampled
APEX-Agents Agentic 6 48.20 2026-05-06
APEX-Agents-AA Agentic 4 32% 2026-05-11
ARC-AGI-1 Agentic 3 98 2026-05-05
ARC-AGI-1 Agentic 1 98% 2026-04-23
ARC-AGI-2 Agentic 8 77.08 2026-05-05
ARC-AGI-2 Agentic 3 77.1% 2026-04-23
ARC-AGI-3 Agentic 3 0.42 2026-05-05
AutoBench Agentic 3 3.21 2026-05-06
AutoLab Agentic 2 0.71 2026-05-06
AutomationBench Agentic 4 9.6% 2026-05-28
AutomationBench Agentic 7 9.60 2026-05-21
BrowseComp Agentic 1 85.9% 2026-05-28
BrowseComp Agentic 3 85.9% 2026-04-23
BrowseComp Agentic 3 85.9% 2026-04-16
Claw-Eval-Live Agentic 8 53.3 2026-05-27
EnterpriseOps-Gym Agentic 4 36.6% 2026-05-05
GDPval-AA Agentic 4 1314 Elo 2026-05-28
GDPval-AA Agentic 9 1317 2026-05-06
Gert Labs Rankings Agentic 24 0.51 2026-05-11
HiL-Bench Agentic 5 20.33% 2026-05-05
ITBench-AA Agentic 17 30.3% 2026-05-28
MCP Atlas Agentic 3 78.2% 2026-05-28
MCP Atlas Agentic 1 78.20 2026-05-06
MCP Atlas Agentic 2 78.2% 2026-04-23
MCP Atlas Agentic 3 73.9% 2026-04-16
MultiChallenge Agentic 1 71.37 2026-05-06
OSWorld-Verified Agentic 4 76.2% 2026-05-28
PinchBench Agentic 18 0.87 2026-05-06
RuneBench Agentic 5 4.50 2026-05-05
t2-bench Agentic 1 0.99 2026-05-06
Tau2-Bench Telecom Agentic 17 95.6% 2026-05-11
Terminal-Bench Hard Agentic 6 53.8% 2026-05-11
TERMS-Bench Agentic 5 63.9% SE+ 2026-05-28
Toolathlon Agentic 3 48.8% 2026-04-23
Vending-Bench 2 Agentic 18 3774.25 2026-05-28
Vending-Bench 2 Agentic 28 911.21 2026-05-28
WildClawBench Agentic 4 40.80 2026-05-06
OpenUGI Alignment 99 50.67 2026-05-06
OpenUGI Alignment 116 49.68 2026-05-06
OpenUGI Alignment 439 38.44 2026-05-06
scBench Biology 6 53.85% 2026-05-27
SpatialBench Biology 6 51.57% 2026-05-27
ALE-Bench Coding 19 1160.60 2026-05-06
ALE-Bench Coding 24 1054.78 2026-05-06
Arena AI Code Coding 16 1454 2026-05-06
BLXBench Coding 24 3.70 2026-05-06
DeepSWE Coding 11 9.88 2026-05-26
LiveCodeBench Coding 1 88.485% 2026-05-28
LMArena WebDev Arena Coding 15 1454.71 2026-05-06
SciCode Coding 1 58.9% 2026-05-11
SWE Atlas - Codebase QnA Coding 8 13.50 2026-05-06
SWE Atlas - Refactoring Coding 6 33.81 2026-05-06
SWE Atlas - Test Writing Coding 2 29.84 2026-05-06
SWE-bench Verified Coding 4 78.8% 2026-05-28
Terminal-Bench 2.0 Coding 4 67.416% 2026-05-28
Terminal-Bench 2.0 Coding 4 68.5% 2026-04-23
Terminal-Bench 2.0 Coding 4 68.5% 2026-04-16
Terminal-Bench 2.1 Coding 4 70.787% 2026-05-28
Terminal-Bench 2.1 Coding 3 70.3% 2026-05-28
Vibe Code Bench v1.1 Coding 15 32.034% 2026-05-28
ExploitBench v8-bench Cybersecurity 8 3.67 points 2026-05-15
ExploitBench v8-bench Cybersecurity 16 3.17 points 2026-05-15
SecCodeBench Cybersecurity 17 55.21% 2026-05-28
Arena AI Document Document AI 13 1449 2026-05-06
OfficeQA Pro Document AI 4 18.1% 2026-04-23
GSMA Open Telco Leaderboard Domain 2 75.55 2026-05-06
SAGE Education 14 48.677% 2026-05-28
TutorBench Education 4 52.99 2026-05-06
AA-Omniscience Factuality 1 32.93 2026-05-11
Vectara HHEM Hallucination Leaderboard Factuality 57 89.60 2026-05-06
CorpFin v2 Finance 21 64.491% 2026-05-28
Finance Agent v1.1 Finance 7 59.717% 2026-05-04
Finance Agent v1.1 Finance 4 59.7% 2026-04-23
Finance Agent v1.1 Finance 4 59.7% 2026-04-16
Finance Agent v2 Finance 11 42.982% 2026-05-28
Finance Agent v2 Finance 4 43% 2026-05-28
MortgageTax Finance 3 69.396% 2026-05-28
PRBench Finance Finance 14 41.87 2026-05-06
QuantSightBench Finance 1 0.7910 coverage 2026-05-28
Rogo Big Finance Bench Finance 8 41% rubric / 35% final 2026-05-28
TaxBench Finance 5 20.10% mean pass^5 2026-05-27
TaxEval v2 Finance 37 72.882% 2026-05-28
React Native Evals Frontend Development 9 78.9011% overall 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 15 1209.82 Elo / 13 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 25 1041.51 Elo / 90 games 2026-05-28
InfiniteBM Liar's Dice Game 2 1566.69 Elo / 27 games 2026-05-28
InfiniteBM Liar's Dice Game 3 1401.19 Elo / 91 games 2026-05-28
MageBench Season 1 Game 15 1602 rating / 10 games 2026-05-28
ALL Bench LLM General Knowledge 4 58.96 2026-05-06
BenchLM General Knowledge 2 92 2026-05-06
GDPval Generalization 6 67.3% 2026-04-23
LMArena Text Arena Generalization 3 1487.43 2026-05-06
MedCode Healthcare 1 59.062% 2026-05-28
MedQA Healthcare 3 96.367% 2026-04-16
MedScribe Healthcare 36 76.114% 2026-05-28
HUMAINE Human Preference 4 3.73 2026-05-06
AIIQ Composite IQ Intelligence 3 132 2026-05-12
Artificial Analysis Intelligence Index Intelligence 4 57.18 2026-05-11
GPQA Diamond Intelligence 1 95.454% 2026-05-28
Humanity's Last Exam Intelligence 4 51.4% 2026-05-28
Humanity's Last Exam Intelligence 1 44.7% 2026-05-11
Humanity's Last Exam Intelligence 6 51.4% 2026-04-23
Humanity's Last Exam Intelligence 5 51.4% 2026-04-16
LiveBench Intelligence 3 80.71 2026-05-05
MathVision Intelligence 2 95.70 2026-05-06
MathVision Intelligence 5 89.80 2026-05-06
MMLU Pro Intelligence 1 90.987% 2026-05-28
MMMU Pro Intelligence 3 88.208% 2026-05-28
Vals Index Intelligence 9 53.423% 2026-05-28
Vals Multimodal Index Intelligence 7 55.749% 2026-05-28
CaseLaw v2 Legal 12 64.845% 2026-05-04
LegalBench Legal 1 87.398% 2026-05-28
Professional Reasoning Bench - Legal Legal 9 44.02 2026-05-06
Realm Warren Legal 3 0.22 2026-05-07
MRCR v2 (8-needle) Long Context 6 0.26 2026-05-06
AIME Math 1 98.125% 2026-04-16
LiveMathematicianBench Math 1 43.5% 2026-05-28
ProofBench Math 11 26% 2026-05-28
ArxivMath Mathematics 3 64.8% 2026-05-28
FrontierMath 2025-02-28 Private Mathematics 6 36.9% 2026-04-23
FrontierMath Tier 4 2025-07-01 Private Mathematics 6 16.7% 2026-04-23
Medical Chronology LLM Benchmark Medical 10 0.88 2026-05-06
Global MMLU Multilingual 1 92.2% 2026-05-28
MMMLU Multilingual 1 92.6% 2026-04-16
ALL Bench Multimodal Multimodal 1 63.96 2026-05-06
ALL Bench Multimodal Multimodal 9 8.20 2026-05-06
ALL Bench Multimodal Multimodal 2 37.66 2026-05-06
Blueprint-Bench 2 Multimodal 5 0.661 +/- 0.011 2026-05-28
Design Arena Multimodal 20 1287 2026-05-06
IDP Leaderboard Multimodal 6 81.58 2026-05-06
LMArena Vision Arena Multimodal 8 1294.62 2026-05-06
MMMU-Pro Multimodal 6 80.50 2026-05-06
MMMU-Pro Multimodal 3 80.5% 2026-04-23
VTB Multimodal 1 28.97 2026-05-06
ARC-AGI v2 Reasoning 2 0.77 2026-05-06
CAIS Text Capabilities Index Reasoning 2 52.9 2026-05-27
Context Arena Reasoning 19 53.84 2026-05-06
Context Arena Reasoning 25 48.69 2026-05-06
EnigmaEval Reasoning 1 19.76 2026-05-06
GPQA Diamond Reasoning 1 94.3% 2026-05-28
GPQA Diamond Reasoning 1 94.1% 2026-05-11
GPQA Diamond Reasoning 2 94.3% 2026-04-23
GPQA Diamond Reasoning 3 94.3% 2026-04-16
Humanity's Last Exam (Text Only) Reasoning 1 47.31 2026-05-06
MultiNRC Reasoning 1 64.74 2026-05-06
CAIS Risk Index Safety 23 55.6 2026-05-27
LiveSecBench Safety 15 58.16 2026-05-27
CritPt Science 8 17.7% 2026-05-11
ProgramBench Software Engineering 5 0% 2026-05-05
SWE-bench Pro Software Engineering 4 54.2% 2026-05-28
SWE-bench Pro Software Engineering 4 54.2% 2026-04-23
SWE-bench Pro Software Engineering 4 54.2% 2026-04-16
SWE-bench Verified Software Engineering 3 80.6% 2026-05-28
SWE-bench Verified Software Engineering 4 80.6% 2026-04-16
Structured Output Benchmark Structured Output 2 86.90 2026-05-06
LiveSQLBench Text to SQL 1 43.10 2026-05-06
CAIS Vision Capabilities Index Vision 3 63.1 2026-05-27
Roboflow Vision Evals - Visual Understanding Vision 4 77.61% 2026-05-22