Gemini 3.1 Pro Preview
Gemini / Google
158scores
122benchmarks
$2 / $12 per 1M tokenscost in/out
Metadata
Gemini Closed/API
Aliases: gemini-3.1-pro-preview, gemini-3.1-pro-preview-20260219, google-gemini-3.1-pro-preview, google-gemini-3.1-pro-preview-20260219, google/gemini-3.1-pro-preview, google/gemini-3.1-pro-preview-20260219
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 6 | 48.20 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 4 | 32% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 3 | 98 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 1 | 98% | 2026-04-23 |
| ARC-AGI-2 | Agentic | 8 | 77.08 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 3 | 77.1% | 2026-04-23 |
| ARC-AGI-3 | Agentic | 3 | 0.42 | 2026-05-05 |
| AutoBench | Agentic | 3 | 3.21 | 2026-05-06 |
| AutoLab | Agentic | 2 | 0.71 | 2026-05-06 |
| AutomationBench | Agentic | 4 | 9.6% | 2026-05-28 |
| AutomationBench | Agentic | 7 | 9.60 | 2026-05-21 |
| BrowseComp | Agentic | 1 | 85.9% | 2026-05-28 |
| BrowseComp | Agentic | 3 | 85.9% | 2026-04-23 |
| BrowseComp | Agentic | 3 | 85.9% | 2026-04-16 |
| Claw-Eval-Live | Agentic | 8 | 53.3 | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 4 | 36.6% | 2026-05-05 |
| GDPval-AA | Agentic | 4 | 1314 Elo | 2026-05-28 |
| GDPval-AA | Agentic | 9 | 1317 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 24 | 0.51 | 2026-05-11 |
| HiL-Bench | Agentic | 5 | 20.33% | 2026-05-05 |
| ITBench-AA | Agentic | 17 | 30.3% | 2026-05-28 |
| MCP Atlas | Agentic | 3 | 78.2% | 2026-05-28 |
| MCP Atlas | Agentic | 1 | 78.20 | 2026-05-06 |
| MCP Atlas | Agentic | 2 | 78.2% | 2026-04-23 |
| MCP Atlas | Agentic | 3 | 73.9% | 2026-04-16 |
| MultiChallenge | Agentic | 1 | 71.37 | 2026-05-06 |
| OSWorld-Verified | Agentic | 4 | 76.2% | 2026-05-28 |
| PinchBench | Agentic | 18 | 0.87 | 2026-05-06 |
| RuneBench | Agentic | 5 | 4.50 | 2026-05-05 |
| t2-bench | Agentic | 1 | 0.99 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 17 | 95.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 6 | 53.8% | 2026-05-11 |
| TERMS-Bench | Agentic | 5 | 63.9% SE+ | 2026-05-28 |
| Toolathlon | Agentic | 3 | 48.8% | 2026-04-23 |
| Vending-Bench 2 | Agentic | 18 | 3774.25 | 2026-05-28 |
| Vending-Bench 2 | Agentic | 28 | 911.21 | 2026-05-28 |
| WildClawBench | Agentic | 4 | 40.80 | 2026-05-06 |
| OpenUGI | Alignment | 99 | 50.67 | 2026-05-06 |
| OpenUGI | Alignment | 116 | 49.68 | 2026-05-06 |
| OpenUGI | Alignment | 439 | 38.44 | 2026-05-06 |
| scBench | Biology | 6 | 53.85% | 2026-05-27 |
| SpatialBench | Biology | 6 | 51.57% | 2026-05-27 |
| ALE-Bench | Coding | 19 | 1160.60 | 2026-05-06 |
| ALE-Bench | Coding | 24 | 1054.78 | 2026-05-06 |
| Arena AI Code | Coding | 16 | 1454 | 2026-05-06 |
| BLXBench | Coding | 24 | 3.70 | 2026-05-06 |
| DeepSWE | Coding | 11 | 9.88 | 2026-05-26 |
| LiveCodeBench | Coding | 1 | 88.485% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 15 | 1454.71 | 2026-05-06 |
| SciCode | Coding | 1 | 58.9% | 2026-05-11 |
| SWE Atlas - Codebase QnA | Coding | 8 | 13.50 | 2026-05-06 |
| SWE Atlas - Refactoring | Coding | 6 | 33.81 | 2026-05-06 |
| SWE Atlas - Test Writing | Coding | 2 | 29.84 | 2026-05-06 |
| SWE-bench Verified | Coding | 4 | 78.8% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 4 | 67.416% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 4 | 68.5% | 2026-04-23 |
| Terminal-Bench 2.0 | Coding | 4 | 68.5% | 2026-04-16 |
| Terminal-Bench 2.1 | Coding | 4 | 70.787% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 3 | 70.3% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 15 | 32.034% | 2026-05-28 |
| ExploitBench v8-bench | Cybersecurity | 8 | 3.67 points | 2026-05-15 |
| ExploitBench v8-bench | Cybersecurity | 16 | 3.17 points | 2026-05-15 |
| SecCodeBench | Cybersecurity | 17 | 55.21% | 2026-05-28 |
| Arena AI Document | Document AI | 13 | 1449 | 2026-05-06 |
| OfficeQA Pro | Document AI | 4 | 18.1% | 2026-04-23 |
| GSMA Open Telco Leaderboard | Domain | 2 | 75.55 | 2026-05-06 |
| SAGE | Education | 14 | 48.677% | 2026-05-28 |
| TutorBench | Education | 4 | 52.99 | 2026-05-06 |
| AA-Omniscience | Factuality | 1 | 32.93 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 57 | 89.60 | 2026-05-06 |
| CorpFin v2 | Finance | 21 | 64.491% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 7 | 59.717% | 2026-05-04 |
| Finance Agent v1.1 | Finance | 4 | 59.7% | 2026-04-23 |
| Finance Agent v1.1 | Finance | 4 | 59.7% | 2026-04-16 |
| Finance Agent v2 | Finance | 11 | 42.982% | 2026-05-28 |
| Finance Agent v2 | Finance | 4 | 43% | 2026-05-28 |
| MortgageTax | Finance | 3 | 69.396% | 2026-05-28 |
| PRBench Finance | Finance | 14 | 41.87 | 2026-05-06 |
| QuantSightBench | Finance | 1 | 0.7910 coverage | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 8 | 41% rubric / 35% final | 2026-05-28 |
| TaxBench | Finance | 5 | 20.10% mean pass^5 | 2026-05-27 |
| TaxEval v2 | Finance | 37 | 72.882% | 2026-05-28 |
| React Native Evals | Frontend Development | 9 | 78.9011% overall | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 15 | 1209.82 Elo / 13 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 25 | 1041.51 Elo / 90 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 2 | 1566.69 Elo / 27 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 3 | 1401.19 Elo / 91 games | 2026-05-28 |
| MageBench Season 1 | Game | 15 | 1602 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 4 | 58.96 | 2026-05-06 |
| BenchLM | General Knowledge | 2 | 92 | 2026-05-06 |
| GDPval | Generalization | 6 | 67.3% | 2026-04-23 |
| LMArena Text Arena | Generalization | 3 | 1487.43 | 2026-05-06 |
| MedCode | Healthcare | 1 | 59.062% | 2026-05-28 |
| MedQA | Healthcare | 3 | 96.367% | 2026-04-16 |
| MedScribe | Healthcare | 36 | 76.114% | 2026-05-28 |
| HUMAINE | Human Preference | 4 | 3.73 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 3 | 132 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 4 | 57.18 | 2026-05-11 |
| GPQA Diamond | Intelligence | 1 | 95.454% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 4 | 51.4% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 1 | 44.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 6 | 51.4% | 2026-04-23 |
| Humanity's Last Exam | Intelligence | 5 | 51.4% | 2026-04-16 |
| LiveBench | Intelligence | 3 | 80.71 | 2026-05-05 |
| MathVision | Intelligence | 2 | 95.70 | 2026-05-06 |
| MathVision | Intelligence | 5 | 89.80 | 2026-05-06 |
| MMLU Pro | Intelligence | 1 | 90.987% | 2026-05-28 |
| MMMU Pro | Intelligence | 3 | 88.208% | 2026-05-28 |
| Vals Index | Intelligence | 9 | 53.423% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 7 | 55.749% | 2026-05-28 |
| CaseLaw v2 | Legal | 12 | 64.845% | 2026-05-04 |
| LegalBench | Legal | 1 | 87.398% | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 9 | 44.02 | 2026-05-06 |
| Realm Warren | Legal | 3 | 0.22 | 2026-05-07 |
| MRCR v2 (8-needle) | Long Context | 6 | 0.26 | 2026-05-06 |
| AIME | Math | 1 | 98.125% | 2026-04-16 |
| LiveMathematicianBench | Math | 1 | 43.5% | 2026-05-28 |
| ProofBench | Math | 11 | 26% | 2026-05-28 |
| ArxivMath | Mathematics | 3 | 64.8% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 6 | 36.9% | 2026-04-23 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 6 | 16.7% | 2026-04-23 |
| Medical Chronology LLM Benchmark | Medical | 10 | 0.88 | 2026-05-06 |
| Global MMLU | Multilingual | 1 | 92.2% | 2026-05-28 |
| MMMLU | Multilingual | 1 | 92.6% | 2026-04-16 |
| ALL Bench Multimodal | Multimodal | 1 | 63.96 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 9 | 8.20 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 2 | 37.66 | 2026-05-06 |
| Blueprint-Bench 2 | Multimodal | 5 | 0.661 +/- 0.011 | 2026-05-28 |
| Design Arena | Multimodal | 20 | 1287 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 6 | 81.58 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 8 | 1294.62 | 2026-05-06 |
| MMMU-Pro | Multimodal | 6 | 80.50 | 2026-05-06 |
| MMMU-Pro | Multimodal | 3 | 80.5% | 2026-04-23 |
| VTB | Multimodal | 1 | 28.97 | 2026-05-06 |
| ARC-AGI v2 | Reasoning | 2 | 0.77 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 2 | 52.9 | 2026-05-27 |
| Context Arena | Reasoning | 19 | 53.84 | 2026-05-06 |
| Context Arena | Reasoning | 25 | 48.69 | 2026-05-06 |
| EnigmaEval | Reasoning | 1 | 19.76 | 2026-05-06 |
| GPQA Diamond | Reasoning | 1 | 94.3% | 2026-05-28 |
| GPQA Diamond | Reasoning | 1 | 94.1% | 2026-05-11 |
| GPQA Diamond | Reasoning | 2 | 94.3% | 2026-04-23 |
| GPQA Diamond | Reasoning | 3 | 94.3% | 2026-04-16 |
| Humanity's Last Exam (Text Only) | Reasoning | 1 | 47.31 | 2026-05-06 |
| MultiNRC | Reasoning | 1 | 64.74 | 2026-05-06 |
| CAIS Risk Index | Safety | 23 | 55.6 | 2026-05-27 |
| LiveSecBench | Safety | 15 | 58.16 | 2026-05-27 |
| CritPt | Science | 8 | 17.7% | 2026-05-11 |
| ProgramBench | Software Engineering | 5 | 0% | 2026-05-05 |
| SWE-bench Pro | Software Engineering | 4 | 54.2% | 2026-05-28 |
| SWE-bench Pro | Software Engineering | 4 | 54.2% | 2026-04-23 |
| SWE-bench Pro | Software Engineering | 4 | 54.2% | 2026-04-16 |
| SWE-bench Verified | Software Engineering | 3 | 80.6% | 2026-05-28 |
| SWE-bench Verified | Software Engineering | 4 | 80.6% | 2026-04-16 |
| Structured Output Benchmark | Structured Output | 2 | 86.90 | 2026-05-06 |
| LiveSQLBench | Text to SQL | 1 | 43.10 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 3 | 63.1 | 2026-05-27 |
| Roboflow Vision Evals - Visual Understanding | Vision | 4 | 77.61% | 2026-05-22 |
No matching rows.