Gemini 2.5 Flash

Gemini / Google

104scores
76benchmarks
$0.3 / $2.5 per 1M tokenscost in/out

Metadata

Gemini Closed/API

Aliases: gemini-2.5-flash, google-gemini-2.5-flash, google/gemini-2.5-flash

Benchmark Results

Benchmark Category Rank Score Sampled
AMA-Bench Agentic 4 0.51 2026-05-06
ARC-AGI-1 Agentic 86 33.33 2026-05-05
ARC-AGI-1 Agentic 87 33.33 2026-05-05
ARC-AGI-1 Agentic 90 32.33 2026-05-05
ARC-AGI-1 Agentic 101 25.83 2026-05-05
ARC-AGI-1 Agentic 117 16 2026-05-05
ARC-AGI-2 Agentic 85 2.54 2026-05-05
ARC-AGI-2 Agentic 88 2.16 2026-05-05
ARC-AGI-2 Agentic 89 2.12 2026-05-05
ARC-AGI-2 Agentic 94 1.98 2026-05-05
ARC-AGI-2 Agentic 100 1.69 2026-05-05
Berkeley Function-Calling Leaderboard Agentic 15 56.24% 2026-05-27
Berkeley Function-Calling Leaderboard Agentic 26 50.9% 2026-05-27
CAR-bench Agentic 6 0.41 2026-05-06
CAR-bench Agentic 9 0.34 2026-05-06
Galileo Agent Leaderboard Agentic 13 0.38 2026-05-06
LLM-WikiRace Agentic 8 53 2026-05-06
MCP-Universe Agentic 18 21.65 2026-05-06
MCPMark Agentic 31 0.09 2026-05-06
PinchBench Agentic 57 0.71 2026-05-06
RealDataAgentBench Agentic 11 0.66 2026-04-28
Tau2-Bench Telecom Agentic 233 31.6% 2026-05-11
Tau2-Bench Telecom Agentic 349 14.9% 2026-05-11
Terminal-Bench Hard Agentic 184 13.6% 2026-05-11
Terminal-Bench Hard Agentic 193 12.1% 2026-05-11
UAVBench Agentic 11 76.75 2026-05-06
Vending-Bench 2 Agentic 30 548.84 2026-05-28
IOI Coding 47 2.611% 2026-05-26
SciCode Coding 112 39.4% 2026-05-11
SciCode Coding 262 29.1% 2026-05-11
TuRTLe Code Completion (Icarus Verilog) Coding 10 69.84 2026-05-06
TuRTLe Code Completion (Verilator) Coding 10 70.19 2026-05-06
TuRTLe Spec-to-RTL (Icarus Verilog) Coding 12 63.55 2026-05-06
TuRTLe Spec-to-RTL (Verilator) Coding 12 63.27 2026-05-06
RP-Bench Creative 5 1539 2026-05-06
RP-Bench Creative 17 1407.80 2026-05-06
RP-Bench Creative 28 4.21 2026-05-06
MMTU Data 7 0.63 2026-05-06
VAREX-Bench Document Understanding 2 97.3% EM 2026-05-28
GSMA Open Telco Leaderboard Domain 24 63.30 2026-05-06
SAGE Education 20 44.756% 2026-05-28
RoboBench Embodied 3 45.06 2026-05-27
Vectara HHEM Hallucination Leaderboard Factuality 38 92.20 2026-05-06
FinanceArena Finance 16 32.4 2026-05-27
FinChain Finance 4 58.01 ChainEval 2026-05-28
PRBench Finance Finance 16 38.41 2026-05-06
InfiniteBM Heads-Up No-Limit Hold'em Game 18 1158.98 Elo / 13 games 2026-05-28
InfiniteBM Heads-Up No-Limit Hold'em Game 27 1026.61 Elo / 90 games 2026-05-28
InfiniteBM Liar's Dice Game 22 1174.71 Elo / 91 games 2026-05-28
InfiniteBM Liar's Dice Game 29 1086.72 Elo / 31 games 2026-05-28
MageBench Season 1 Game 24 1572 rating / 4 games 2026-05-28
Xent Games Game 5 59.08 overall 2026-05-28
BenchLM General Knowledge 80 38 2026-05-06
Global-MMLU-Lite General Knowledge 3 0.88 2026-05-06
Arena-Hard Generalization 5 68.6% 2026-05-27
HELM AIR-Bench Generalization 37 0.686688 2026-05-28
HELM Safety Generalization 29 0.911812 2026-05-28
LongBench v2 Generalization 2 62.1% 2026-05-27
GeoRC Geospatial 7 41.3 2026-05-27
MedCode Healthcare 34 38.425% 2026-05-28
MedScribe Healthcare 16 82.869% 2026-05-28
HUMAINE Human Preference 13 3.66 2026-05-06
Artificial Analysis Intelligence Index Intelligence 175 27.04 2026-05-11
Artificial Analysis Intelligence Index Intelligence 238 20.56 2026-05-11
Humanity's Last Exam Intelligence 135 11.1% 2026-05-11
Humanity's Last Exam Intelligence 287 5.1% 2026-05-11
MMLU-Pro Intelligence 57 83.2% 2026-05-11
MMLU-Pro Intelligence 95 80.9% 2026-05-11
PatentBench Legal 3 99.10 2026-05-26
Professional Reasoning Bench - Legal Legal 13 41.02 2026-05-06
ConStory-Bench Long Context 3 CED 0.305 2026-05-28
AIME 2025 Math 86 73.3% 2026-05-11
AIME 2025 Math 117 60.3% 2026-05-11
IneqMath Math 11 23.50 2026-05-06
IneqMath Math 33 4.50 2026-05-06
BRIDGE Medical Leaderboard Medical 3 53.36 2026-05-27
BRIDGE Medical Leaderboard Medical 49 44.84 2026-05-27
BRIDGE Medical Leaderboard Medical 64 43.29 2026-05-27
LiveMedBench Medical 27 0.064 2026-05-27
Medical Chronology LLM Benchmark Medical 2 0.91 2026-05-06
AfroBench-Lite Multilingual 6 66.71 2026-05-06
LanguageBench Multilingual 4 0.68 2026-05-06
Design Arena Multimodal 90 1117 2026-05-06
Math-VR Multimodal 4 60.5 2026-05-27
MMAU Multimodal 9 67.39 2026-05-06
Vibe-Eval Multimodal 3 0.65 2026-05-06
Video SimpleQA Multimodal 3 57 2026-05-06
Visual-Language Understanding Multimodal 15 46.97 2026-05-06
VTB Multimodal 13 4.69 2026-05-06
CAIS Text Capabilities Index Reasoning 34 9.0 2026-05-27
GPQA Diamond Reasoning 111 79% 2026-05-11
GPQA Diamond Reasoning 220 68.3% 2026-05-11
Humanity's Last Exam (Text Only) Reasoning 20 12.58 2026-05-06
CAIS Risk Index Safety 31 60.1 2026-05-27
InvisibleBench Safety 10 0.11 2026-05-06
LiveSecBench Safety 23 42.38 2026-05-27
CritPt Science 69 1.4% 2026-05-11
CritPt Science 79 1.1% 2026-05-11
AudioMC Speech 2 40.04 2026-05-07
AudioMC Speech 6 26.11 2026-05-07
AudioMC - Text Output Speech 2 40.04 2026-05-06
AudioMC - Text Output Speech 4 26.11 2026-05-06
Structured Output Benchmark Structured Output 8 86 2026-05-06
CAIS Vision Capabilities Index Vision 17 49.6 2026-05-27