Gemini 2.5 Pro

Gemini / Google

115scores
106benchmarks
$1.25 / $10 per 1M tokenscost in/out

Metadata

Gemini Closed/API

Aliases: gemini-2.5-pro, google-gemini-2.5-pro, google/gemini-2.5-pro

Benchmark Results

Benchmark Category Rank Score Sampled
ARC-AGI-1 Agentic 75 41 2026-05-05
ARC-AGI-1 Agentic 83 37 2026-05-05
ARC-AGI-1 Agentic 93 29.50 2026-05-05
ARC-AGI-1 Agentic 118 16 2026-05-05
ARC-AGI-2 Agentic 67 4.86 2026-05-05
ARC-AGI-2 Agentic 73 4.03 2026-05-05
ARC-AGI-2 Agentic 82 2.92 2026-05-05
ARC-AGI-2 Agentic 143 0 2026-05-05
CAR-bench Agentic 7 0.38 2026-05-06
EnterpriseOps-Gym Agentic 20 17.8% 2026-05-05
Galileo Agent Leaderboard Agentic 10 0.43 2026-05-06
Gert Labs Rankings Agentic 30 0.48 2026-05-11
MCP-Universe Agentic 16 22.08 2026-05-06
MCPMark Agentic 29 0.16 2026-05-06
MultiChallenge Agentic 15 53.62 2026-05-06
OSWorld-MCP Agentic 9 25.70 2026-05-06
OSWorld-MCP Agentic 12 17.40 2026-05-06
PinchBench Agentic 53 0.72 2026-05-06
Poker Agent Agentic 12 1032.596% 2025-12-23
Tau2-Bench Telecom Agentic 168 54.1% 2026-05-11
Terminal-Bench Hard Agentic 111 26.5% 2026-05-11
Vending-Bench 2 Agentic 29 573.64 2026-05-28
OpenUGI Alignment 156 47.84 2026-05-06
AHa-Bench Audio 1 60% 2026-05-28
scBench Biology 16 23.59% 2026-05-27
SpatialBench Biology 16 28.93% 2026-05-27
TextClass Benchmark Classification 64 1517.98 2026-05-06
ABC-Bench Coding 9 25.0% +/- 1.7 2026-05-27
Arena AI Code Coding 68 1203 2026-05-06
ArtifactsBench Coding 3 57.74 2026-05-06
CadEval Coding 2 64 2026-05-06
ContextBench Coding 4 36.40 2026-05-06
IOI Coding 19 17.084% 2026-05-26
SciCode Coding 60 42.8% 2026-05-11
SWE-bench Verified Coding 45 54.4% 2026-05-28
Terminal-Bench 2.0 Coding 42 30.337% 2026-05-28
Vibe Code Bench v1.1 Coding 43 0.4% 2026-05-28
MMTU Data 4 0.66 2026-05-06
VAREX-Bench Document Understanding 1 98.0% EM 2026-05-28
Arena AI Document Document AI 17 1429 2026-05-06
GSMA Open Telco Leaderboard Domain 21 63.97 2026-05-06
IslamicLegalBench Domain 5 62.79 2026-05-06
SAGE Education 28 41.916% 2026-05-28
RoboBench Embodied 2 50.10 2026-05-27
kluster.ai LLM Hallucination Detection Leaderboard Factuality 1 99.03 2026-05-06
Vectara HHEM Hallucination Leaderboard Factuality 31 93 2026-05-06
CorpFin v2 Finance 44 60.8% 2026-05-28
Finance Agent v1.1 Finance 40 41.589% 2026-05-04
FinanceArena Finance 4 45.3 2026-05-27
FinChain Finance 1 58.65 ChainEval 2026-05-28
MortgageTax Finance 5 68.918% 2026-05-28
PRBench Finance Finance 16 38.92 2026-05-06
TaxBench Finance 14 9.00% mean pass^5 2026-05-27
MageBench Season 1 Game 29 1540 rating / 9 games 2026-05-28
Xent Games Game 1 65.86 overall 2026-05-28
BenchLM General Knowledge 41 65 2026-05-06
Global-MMLU-Lite General Knowledge 2 0.89 2026-05-06
HELM AIR-Bench Generalization 32 0.735862 2026-05-28
HELM Safety Generalization 28 0.913978 2026-05-28
LMArena Text Arena Generalization 15 1459.96 2026-05-06
LongBench v2 Generalization 1 63.3% 2026-05-27
WeirdML Generalization 5 54.03 2026-05-06
GeoRC Geospatial 6 41.51 2026-05-27
HELM MedQA Healthcare 4 0.934394 2026-05-28
MedCode Healthcare 9 50.59% 2026-05-28
MedScribe Healthcare 41 73.552% 2026-05-28
HUMAINE Human Preference 3 3.76 2026-05-06
AIIQ Composite IQ Intelligence 21 112 2026-05-12
Artificial Analysis Intelligence Index Intelligence 111 34.63 2026-05-11
Humanity's Last Exam Intelligence 64 21.1% 2026-05-11
MathVision Intelligence 23 73.30 2026-05-06
MMLU-Pro Intelligence 19 86.2% 2026-05-11
OCRBench v2 Intelligence 8 59.30 2026-05-06
OCRBench v2 Intelligence 4 62.20 2026-05-06
CaseLaw v2 Legal 15 63.88% 2026-05-04
LEXam Legal 2 67.40% open / 55.72% MCQ 2026-05-28
PatentBench Legal 5 88.70 2026-05-26
Professional Reasoning Bench - Legal Legal 13 41.43 2026-05-06
ConStory-Bench Long Context 2 CED 0.302 2026-05-28
needle-1M-bench Long Context 3 100 2026-05-06
Fiction.LiveBench Long Context 4 90.60 2026-05-06
AIME 2025 Math 41 87.7% 2026-05-11
IneqMath Math 5 43.50 2026-05-06
IneqMath Math 27 6 2026-05-06
FrontierMath 2025-02-28 Private Mathematics 3 29 2026-05-06
FrontierMath Tier 4 2025-07-01 Private Mathematics 3 10.40 2026-05-06
OTIS Mock AIME 2024-2025 Mathematics 7 84.72 2026-05-06
LiveMedBench Medical 12 0.1606 2026-05-27
Medical Chronology LLM Benchmark Medical 6 0.90 2026-05-06
AfroBench-Lite Multilingual 3 74.53 2026-05-06
LanguageBench Multilingual 33 0.05 2026-05-06
Design Arena Multimodal 56 1212 2026-05-06
LMArena Vision Arena Multimodal 18 1261.55 2026-05-06
Math-VR Multimodal 3 64.7 2026-05-27
MMAU Multimodal 8 69.36 2026-05-06
MMSI-Bench Multimodal 9 36.9% 2026-05-28
Vibe-Eval Multimodal 2 0.66 2026-05-06
Video SimpleQA Multimodal 2 62.60 2026-05-06
VPCT Multimodal 5 48 2026-05-06
WebMainBench Multimodal 3 0.90 2026-05-06
Artificial Analysis Openness Index Openness 215 5.56 2026-05-11
ARC-AGI v2 Reasoning 15 0.05 2026-05-06
Balrog Reasoning 2 43.30 2026-05-06
CAIS Text Capabilities Index Reasoning 29 16.7 2026-05-27
GPQA Diamond Reasoning 61 84.4% 2026-05-11
LingOly-TOO Reasoning 4 0.42 2026-05-06
SimpleBench Reasoning 2 62.40 2026-05-06
CAIS Risk Index Safety 28 59.0 2026-05-27
CritPt Science 54 2.6% 2026-05-11
GSO-Bench Science 8 3.90 2026-05-06
SciPredict Science 9 17.04 2026-05-06
AudioMC Speech 1 46.90 2026-05-07
AudioMC - Text Output Speech 1 46.90 2026-05-06
CAIS Vision Capabilities Index Vision 12 53.3 2026-05-27
Lech Mazur Writing Writing 5 8.60 2026-05-06