Gemini 3
Gemini / Google
133scores
116benchmarks
—cost in/out
Metadata
Gemini Closed/API
Aliases: gemini-3, google-gemini-3, google/gemini-3
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ADBench | Agentic | 1 | 83 | 2026-05-06 |
| APEX-Agents | Agentic | 17 | 34.10 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 38 | 75 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 35 | 31.11 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 3 | 72.51% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 7 | 68.14% | 2026-05-27 |
| EnterpriseOps-Gym | Agentic | 9 | 27.4% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 14 | 0.56 | 2026-05-11 |
| LLM-WikiRace | Agentic | 2 | 66 | 2026-05-06 |
| MCP Atlas | Agentic | 5 | 70.30 | 2026-05-06 |
| MCPMark | Agentic | 2 | 0.54 | 2026-05-06 |
| MCPMark | Agentic | 5 | 0.51 | 2026-05-06 |
| MultiChallenge | Agentic | 3 | 65.67 | 2026-05-06 |
| PinchBench | Agentic | 56 | 0.71 | 2026-05-06 |
| Poker Agent | Agentic | 6 | 1078.905% | 2025-12-23 |
| RuneBench | Agentic | 10 | 3.80 | 2026-05-05 |
| t2-bench | Agentic | 7 | 0.85 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 70 | 87.1% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 143 | 68.1% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 33 | 41.7% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 70 | 34.1% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 10 | 5478.16 | 2026-05-28 |
| VitaBench | Agentic | 2 | 31.50 | 2026-05-06 |
| VitaBench | Agentic | 15 | 30 | 2026-05-06 |
| OpenUGI | Alignment | 321 | 41.71 | 2026-05-06 |
| OpenUGI | Alignment | 463 | 37.82 | 2026-05-06 |
| ALE-Bench | Coding | 17 | 1176.75 | 2026-05-06 |
| ALE-Bench | Coding | 29 | 988.23 | 2026-05-06 |
| Arena AI Code | Coding | 20 | 1438 | 2026-05-06 |
| HoudiniVexBench | Coding | 2 | 0.50 | 2026-05-06 |
| IOI | Coding | 7 | 38.834% | 2026-05-26 |
| LiveCodeBench | Coding | 11 | 86.407% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 20 | 1438.22 | 2026-05-06 |
| SciCode | Coding | 3 | 56.1% | 2026-05-11 |
| SciCode | Coding | 21 | 49.9% | 2026-05-11 |
| SWE-bench Verified | Coding | 12 | 76.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 15 | 55.056% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 31 | 14.3% | 2026-05-28 |
| VibeCodingBench | Coding | 12 | 85.80 | 2026-05-06 |
| SecCodeBench | Cybersecurity | 4 | 62.42% | 2026-05-28 |
| OmniDocBench 1.5 | Document Understanding | 11 | 0.12 | 2026-05-06 |
| Arena AI Document | Document AI | 15 | 1442 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 3 | 74.65 | 2026-05-06 |
| SAGE | Education | 15 | 47.615% | 2026-05-28 |
| TutorBench | Education | 2 | 53.67 | 2026-05-06 |
| From Perception to Action | Embodied AI | 2 | 19.3% | 2026-05-28 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 83 | 86.40 | 2026-05-06 |
| CorpFin v2 | Finance | 25 | 63.675% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 14 | 55.154% | 2026-05-04 |
| MortgageTax | Finance | 4 | 69.078% | 2026-05-28 |
| PRBench Finance | Finance | 16 | 39.18 | 2026-05-06 |
| QuantSightBench | Finance | 9 | 0.6543 coverage | 2026-05-28 |
| TaxEval v2 | Finance | 39 | 72.568% | 2026-05-28 |
| MageBench Season 1 | Game | 3 | 1722 rating / 11 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 27 | 25.55 | 2026-05-06 |
| BenchLM | General Knowledge | 18 | 81 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 33 | 0.732086 | 2026-05-28 |
| LMArena Text Arena | Generalization | 5 | 1479.29 | 2026-05-06 |
| WeirdML | Generalization | 2 | 69.93 | 2026-05-06 |
| GeoRC | Geospatial | 8 | 40.98 | 2026-05-27 |
| MedCode | Healthcare | 7 | 52.198% | 2026-05-28 |
| MedQA | Healthcare | 8 | 96.033% | 2026-04-16 |
| MedScribe | Healthcare | 47 | 72.036% | 2026-05-28 |
| Omi SOAP Note Safety Benchmark | Healthcare | 2 | 4.70 | 2026-04-21 |
| PlaceboBench | Healthcare | 1 | 73.913 | 2026-05-27 |
| HUMAINE | Human Preference | 38 | 3.34 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 6 | 126 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 31 | 48.39 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 68 | 41.3 | 2026-05-11 |
| GPQA Diamond | Intelligence | 5 | 91.666% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 9 | 37.2% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 34 | 27.6% | 2026-05-11 |
| MathVision | Intelligence | 8 | 86.60 | 2026-05-06 |
| MMLU Pro | Intelligence | 2 | 90.102% | 2026-05-28 |
| MMLU-Pro | Intelligence | 1 | 89.8% | 2026-05-11 |
| MMLU-Pro | Intelligence | 3 | 89.5% | 2026-05-11 |
| MMMU Pro | Intelligence | 5 | 87.514% | 2026-05-28 |
| OCRBench v2 | Intelligence | 3 | 63.40 | 2026-05-06 |
| OCRBench v2 | Intelligence | 3 | 63.80 | 2026-05-06 |
| AraGen v3 | Language | 16 | 64.15 | 2026-05-06 |
| CaseLaw v2 | Legal | 42 | 53.055% | 2026-05-04 |
| LegalBench | Legal | 2 | 87.025% | 2026-05-28 |
| LEXam | Legal | 12 | 55.38% open questions | 2026-05-28 |
| Professional Reasoning Bench - Legal | Legal | 14 | 40.60 | 2026-05-06 |
| MRCR v2 (8-needle) | Long Context | 6 | 0.26 | 2026-05-06 |
| AIME | Math | 4 | 96.68% | 2026-04-16 |
| AIME 2025 | Math | 7 | 95.7% | 2026-05-11 |
| AIME 2025 | Math | 43 | 86.7% | 2026-05-11 |
| MATH 500 | Math | 1 | 96.4% | 2026-01-09 |
| MGSM | Math | 7 | 93.927% | 2026-01-09 |
| ProofBench | Math | 14 | 20% | 2026-05-28 |
| FrontierMath 2025-02-28 Private | Mathematics | 2 | 37.60 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 2 | 18.75 | 2026-05-06 |
| MathArena Apex | Mathematics | 3 | 0.23 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 2 | 92.78 | 2026-05-06 |
| LiveMedBench | Medical | 9 | 0.1829 | 2026-05-27 |
| Medmarks | Medical | 8 | 0.4712656820900838 | 2026-05-27 |
| Medmarks | Medical | 1 | 0.6627770031943667 | 2026-05-27 |
| MedSafe-Dx | Medical | 11 | 62.4 | 2026-05-27 |
| AfroBench-Lite | Multilingual | 2 | 76.01 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 22 | 29.76 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 5 | 16.75 | 2026-05-06 |
| CharXiv-R | Multimodal | 7 | 0.81 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 3 | 82.77 | 2026-05-06 |
| JMMMU-Pro | Multimodal | 1 | 87.05 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 6 | 1304.72 | 2026-05-06 |
| MMLongBench-Doc | Multimodal | 3 | 60.50 | 2026-05-06 |
| MMSI-Bench | Multimodal | 2 | 49.2% | 2026-05-28 |
| VideoMMMU | Multimodal | 1 | 0.88 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 3 | 51.49 | 2026-05-06 |
| VPCT | Multimodal | 1 | 91 | 2026-05-06 |
| VTB | Multimodal | 3 | 26.85 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 216 | 5.56 | 2026-05-11 |
| ARC-AGI v2 | Reasoning | 11 | 0.31 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 7 | 38.4 | 2026-05-27 |
| EnigmaEval | Reasoning | 2 | 18.24 | 2026-05-06 |
| FINAL Bench Metacognitive | Reasoning | 2 | 77.08 | 2026-05-06 |
| Global PIQA | Reasoning | 1 | 0.93 | 2026-05-06 |
| GPQA Diamond | Reasoning | 11 | 90.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 22 | 88.7% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 3 | 37.72 | 2026-05-06 |
| MultiNRC | Reasoning | 2 | 58.96 | 2026-05-06 |
| SimpleBench | Reasoning | 1 | 76.40 | 2026-05-06 |
| CAIS Risk Index | Safety | 30 | 59.9 | 2026-05-27 |
| CritPt | Science | 18 | 9.1% | 2026-05-11 |
| CritPt | Science | 189 | 0% | 2026-05-11 |
| GSO-Bench | Science | 3 | 18.60 | 2026-05-06 |
| SciPredict | Science | 1 | 25.27 | 2026-05-06 |
| IDE-Bench | Software Engineering | 8 | 55 | 2026-05-27 |
| AudioMC | Speech | 1 | 54.65 | 2026-05-07 |
| AudioMC - Text Output | Speech | 1 | 54.65 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 7 | 55.5 | 2026-05-27 |
| K-MetBench | Weather | 1 | 93.7% accuracy | 2026-05-28 |
No matching rows.