Gemini 2.5 Pro
Gemini / Google
115scores
106benchmarks
$1.25 / $10 per 1M tokenscost in/out
Metadata
Gemini Closed/API
Aliases: gemini-2.5-pro, google-gemini-2.5-pro, google/gemini-2.5-pro
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| ARC-AGI-1 | Agentic | 75 | 41 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 83 | 37 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 93 | 29.50 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 118 | 16 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 67 | 4.86 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 73 | 4.03 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 82 | 2.92 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 143 | 0 | 2026-05-05 |
| CAR-bench | Agentic | 7 | 0.38 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 20 | 17.8% | 2026-05-05 |
| Galileo Agent Leaderboard | Agentic | 10 | 0.43 | 2026-05-06 |
| Gert Labs Rankings | Agentic | 30 | 0.48 | 2026-05-11 |
| MCP-Universe | Agentic | 16 | 22.08 | 2026-05-06 |
| MCPMark | Agentic | 29 | 0.16 | 2026-05-06 |
| MultiChallenge | Agentic | 15 | 53.62 | 2026-05-06 |
| OSWorld-MCP | Agentic | 9 | 25.70 | 2026-05-06 |
| OSWorld-MCP | Agentic | 12 | 17.40 | 2026-05-06 |
| PinchBench | Agentic | 53 | 0.72 | 2026-05-06 |
| Poker Agent | Agentic | 12 | 1032.596% | 2025-12-23 |
| Tau2-Bench Telecom | Agentic | 168 | 54.1% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 111 | 26.5% | 2026-05-11 |
| Vending-Bench 2 | Agentic | 29 | 573.64 | 2026-05-28 |
| OpenUGI | Alignment | 156 | 47.84 | 2026-05-06 |
| AHa-Bench | Audio | 1 | 60% | 2026-05-28 |
| scBench | Biology | 16 | 23.59% | 2026-05-27 |
| SpatialBench | Biology | 16 | 28.93% | 2026-05-27 |
| TextClass Benchmark | Classification | 64 | 1517.98 | 2026-05-06 |
| ABC-Bench | Coding | 9 | 25.0% +/- 1.7 | 2026-05-27 |
| Arena AI Code | Coding | 68 | 1203 | 2026-05-06 |
| ArtifactsBench | Coding | 3 | 57.74 | 2026-05-06 |
| CadEval | Coding | 2 | 64 | 2026-05-06 |
| ContextBench | Coding | 4 | 36.40 | 2026-05-06 |
| IOI | Coding | 19 | 17.084% | 2026-05-26 |
| SciCode | Coding | 60 | 42.8% | 2026-05-11 |
| SWE-bench Verified | Coding | 45 | 54.4% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 42 | 30.337% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 43 | 0.4% | 2026-05-28 |
| MMTU | Data | 4 | 0.66 | 2026-05-06 |
| VAREX-Bench | Document Understanding | 1 | 98.0% EM | 2026-05-28 |
| Arena AI Document | Document AI | 17 | 1429 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 21 | 63.97 | 2026-05-06 |
| IslamicLegalBench | Domain | 5 | 62.79 | 2026-05-06 |
| SAGE | Education | 28 | 41.916% | 2026-05-28 |
| RoboBench | Embodied | 2 | 50.10 | 2026-05-27 |
| kluster.ai LLM Hallucination Detection Leaderboard | Factuality | 1 | 99.03 | 2026-05-06 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 31 | 93 | 2026-05-06 |
| CorpFin v2 | Finance | 44 | 60.8% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 40 | 41.589% | 2026-05-04 |
| FinanceArena | Finance | 4 | 45.3 | 2026-05-27 |
| FinChain | Finance | 1 | 58.65 ChainEval | 2026-05-28 |
| MortgageTax | Finance | 5 | 68.918% | 2026-05-28 |
| PRBench Finance | Finance | 16 | 38.92 | 2026-05-06 |
| TaxBench | Finance | 14 | 9.00% mean pass^5 | 2026-05-27 |
| MageBench Season 1 | Game | 29 | 1540 rating / 9 games | 2026-05-28 |
| Xent Games | Game | 1 | 65.86 overall | 2026-05-28 |
| BenchLM | General Knowledge | 41 | 65 | 2026-05-06 |
| Global-MMLU-Lite | General Knowledge | 2 | 0.89 | 2026-05-06 |
| HELM AIR-Bench | Generalization | 32 | 0.735862 | 2026-05-28 |
| HELM Safety | Generalization | 28 | 0.913978 | 2026-05-28 |
| LMArena Text Arena | Generalization | 15 | 1459.96 | 2026-05-06 |
| LongBench v2 | Generalization | 1 | 63.3% | 2026-05-27 |
| WeirdML | Generalization | 5 | 54.03 | 2026-05-06 |
| GeoRC | Geospatial | 6 | 41.51 | 2026-05-27 |
| HELM MedQA | Healthcare | 4 | 0.934394 | 2026-05-28 |
| MedCode | Healthcare | 9 | 50.59% | 2026-05-28 |
| MedScribe | Healthcare | 41 | 73.552% | 2026-05-28 |
| HUMAINE | Human Preference | 3 | 3.76 | 2026-05-06 |
| AIIQ Composite IQ | Intelligence | 21 | 112 | 2026-05-12 |
| Artificial Analysis Intelligence Index | Intelligence | 111 | 34.63 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 64 | 21.1% | 2026-05-11 |
| MathVision | Intelligence | 23 | 73.30 | 2026-05-06 |
| MMLU-Pro | Intelligence | 19 | 86.2% | 2026-05-11 |
| OCRBench v2 | Intelligence | 8 | 59.30 | 2026-05-06 |
| OCRBench v2 | Intelligence | 4 | 62.20 | 2026-05-06 |
| CaseLaw v2 | Legal | 15 | 63.88% | 2026-05-04 |
| LEXam | Legal | 2 | 67.40% open / 55.72% MCQ | 2026-05-28 |
| PatentBench | Legal | 5 | 88.70 | 2026-05-26 |
| Professional Reasoning Bench - Legal | Legal | 13 | 41.43 | 2026-05-06 |
| ConStory-Bench | Long Context | 2 | CED 0.302 | 2026-05-28 |
| needle-1M-bench | Long Context | 3 | 100 | 2026-05-06 |
| Fiction.LiveBench | Long Context | 4 | 90.60 | 2026-05-06 |
| AIME 2025 | Math | 41 | 87.7% | 2026-05-11 |
| IneqMath | Math | 5 | 43.50 | 2026-05-06 |
| IneqMath | Math | 27 | 6 | 2026-05-06 |
| FrontierMath 2025-02-28 Private | Mathematics | 3 | 29 | 2026-05-06 |
| FrontierMath Tier 4 2025-07-01 Private | Mathematics | 3 | 10.40 | 2026-05-06 |
| OTIS Mock AIME 2024-2025 | Mathematics | 7 | 84.72 | 2026-05-06 |
| LiveMedBench | Medical | 12 | 0.1606 | 2026-05-27 |
| Medical Chronology LLM Benchmark | Medical | 6 | 0.90 | 2026-05-06 |
| AfroBench-Lite | Multilingual | 3 | 74.53 | 2026-05-06 |
| LanguageBench | Multilingual | 33 | 0.05 | 2026-05-06 |
| Design Arena | Multimodal | 56 | 1212 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 18 | 1261.55 | 2026-05-06 |
| Math-VR | Multimodal | 3 | 64.7 | 2026-05-27 |
| MMAU | Multimodal | 8 | 69.36 | 2026-05-06 |
| MMSI-Bench | Multimodal | 9 | 36.9% | 2026-05-28 |
| Vibe-Eval | Multimodal | 2 | 0.66 | 2026-05-06 |
| Video SimpleQA | Multimodal | 2 | 62.60 | 2026-05-06 |
| VPCT | Multimodal | 5 | 48 | 2026-05-06 |
| WebMainBench | Multimodal | 3 | 0.90 | 2026-05-06 |
| Artificial Analysis Openness Index | Openness | 215 | 5.56 | 2026-05-11 |
| ARC-AGI v2 | Reasoning | 15 | 0.05 | 2026-05-06 |
| Balrog | Reasoning | 2 | 43.30 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 29 | 16.7 | 2026-05-27 |
| GPQA Diamond | Reasoning | 61 | 84.4% | 2026-05-11 |
| LingOly-TOO | Reasoning | 4 | 0.42 | 2026-05-06 |
| SimpleBench | Reasoning | 2 | 62.40 | 2026-05-06 |
| CAIS Risk Index | Safety | 28 | 59.0 | 2026-05-27 |
| CritPt | Science | 54 | 2.6% | 2026-05-11 |
| GSO-Bench | Science | 8 | 3.90 | 2026-05-06 |
| SciPredict | Science | 9 | 17.04 | 2026-05-06 |
| AudioMC | Speech | 1 | 46.90 | 2026-05-07 |
| AudioMC - Text Output | Speech | 1 | 46.90 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 12 | 53.3 | 2026-05-27 |
| Lech Mazur Writing | Writing | 5 | 8.60 | 2026-05-06 |
No matching rows.