Gemini 2.5 Flash
Gemini / Google
104scores
76benchmarks
$0.3 / $2.5 per 1M tokenscost in/out
Metadata
Gemini Closed/API
Aliases: gemini-2.5-flash, google-gemini-2.5-flash, google/gemini-2.5-flash
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| AMA-Bench | Agentic | 4 | 0.51 | 2026-05-06 |
| ARC-AGI-1 | Agentic | 86 | 33.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 87 | 33.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 90 | 32.33 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 101 | 25.83 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 117 | 16 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 85 | 2.54 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 88 | 2.16 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 89 | 2.12 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 94 | 1.98 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 100 | 1.69 | 2026-05-05 |
| Berkeley Function-Calling Leaderboard | Agentic | 15 | 56.24% | 2026-05-27 |
| Berkeley Function-Calling Leaderboard | Agentic | 26 | 50.9% | 2026-05-27 |
| CAR-bench | Agentic | 6 | 0.41 | 2026-05-06 |
| CAR-bench | Agentic | 9 | 0.34 | 2026-05-06 |
| Galileo Agent Leaderboard | Agentic | 13 | 0.38 | 2026-05-06 |
| LLM-WikiRace | Agentic | 8 | 53 | 2026-05-06 |
| MCP-Universe | Agentic | 18 | 21.65 | 2026-05-06 |
| MCPMark | Agentic | 31 | 0.09 | 2026-05-06 |
| PinchBench | Agentic | 57 | 0.71 | 2026-05-06 |
| RealDataAgentBench | Agentic | 11 | 0.66 | 2026-04-28 |
| Tau2-Bench Telecom | Agentic | 233 | 31.6% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 349 | 14.9% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 184 | 13.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 193 | 12.1% | 2026-05-11 |
| UAVBench | Agentic | 11 | 76.75 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 30 | 548.84 | 2026-05-28 |
| IOI | Coding | 47 | 2.611% | 2026-05-26 |
| SciCode | Coding | 112 | 39.4% | 2026-05-11 |
| SciCode | Coding | 262 | 29.1% | 2026-05-11 |
| TuRTLe Code Completion (Icarus Verilog) | Coding | 10 | 69.84 | 2026-05-06 |
| TuRTLe Code Completion (Verilator) | Coding | 10 | 70.19 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Icarus Verilog) | Coding | 12 | 63.55 | 2026-05-06 |
| TuRTLe Spec-to-RTL (Verilator) | Coding | 12 | 63.27 | 2026-05-06 |
| RP-Bench | Creative | 5 | 1539 | 2026-05-06 |
| RP-Bench | Creative | 17 | 1407.80 | 2026-05-06 |
| RP-Bench | Creative | 28 | 4.21 | 2026-05-06 |
| MMTU | Data | 7 | 0.63 | 2026-05-06 |
| VAREX-Bench | Document Understanding | 2 | 97.3% EM | 2026-05-28 |
| GSMA Open Telco Leaderboard | Domain | 24 | 63.30 | 2026-05-06 |
| SAGE | Education | 20 | 44.756% | 2026-05-28 |
| RoboBench | Embodied | 3 | 45.06 | 2026-05-27 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 38 | 92.20 | 2026-05-06 |
| FinanceArena | Finance | 16 | 32.4 | 2026-05-27 |
| FinChain | Finance | 4 | 58.01 ChainEval | 2026-05-28 |
| PRBench Finance | Finance | 16 | 38.41 | 2026-05-06 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 18 | 1158.98 Elo / 13 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 27 | 1026.61 Elo / 90 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 22 | 1174.71 Elo / 91 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 29 | 1086.72 Elo / 31 games | 2026-05-28 |
| MageBench Season 1 | Game | 24 | 1572 rating / 4 games | 2026-05-28 |
| Xent Games | Game | 5 | 59.08 overall | 2026-05-28 |
| BenchLM | General Knowledge | 80 | 38 | 2026-05-06 |
| Global-MMLU-Lite | General Knowledge | 3 | 0.88 | 2026-05-06 |
| Arena-Hard | Generalization | 5 | 68.6% | 2026-05-27 |
| HELM AIR-Bench | Generalization | 37 | 0.686688 | 2026-05-28 |
| HELM Safety | Generalization | 29 | 0.911812 | 2026-05-28 |
| LongBench v2 | Generalization | 2 | 62.1% | 2026-05-27 |
| GeoRC | Geospatial | 7 | 41.3 | 2026-05-27 |
| MedCode | Healthcare | 34 | 38.425% | 2026-05-28 |
| MedScribe | Healthcare | 16 | 82.869% | 2026-05-28 |
| HUMAINE | Human Preference | 13 | 3.66 | 2026-05-06 |
| Artificial Analysis Intelligence Index | Intelligence | 175 | 27.04 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 238 | 20.56 | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 135 | 11.1% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 287 | 5.1% | 2026-05-11 |
| MMLU-Pro | Intelligence | 57 | 83.2% | 2026-05-11 |
| MMLU-Pro | Intelligence | 95 | 80.9% | 2026-05-11 |
| PatentBench | Legal | 3 | 99.10 | 2026-05-26 |
| Professional Reasoning Bench - Legal | Legal | 13 | 41.02 | 2026-05-06 |
| ConStory-Bench | Long Context | 3 | CED 0.305 | 2026-05-28 |
| AIME 2025 | Math | 86 | 73.3% | 2026-05-11 |
| AIME 2025 | Math | 117 | 60.3% | 2026-05-11 |
| IneqMath | Math | 11 | 23.50 | 2026-05-06 |
| IneqMath | Math | 33 | 4.50 | 2026-05-06 |
| BRIDGE Medical Leaderboard | Medical | 3 | 53.36 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 49 | 44.84 | 2026-05-27 |
| BRIDGE Medical Leaderboard | Medical | 64 | 43.29 | 2026-05-27 |
| LiveMedBench | Medical | 27 | 0.064 | 2026-05-27 |
| Medical Chronology LLM Benchmark | Medical | 2 | 0.91 | 2026-05-06 |
| AfroBench-Lite | Multilingual | 6 | 66.71 | 2026-05-06 |
| LanguageBench | Multilingual | 4 | 0.68 | 2026-05-06 |
| Design Arena | Multimodal | 90 | 1117 | 2026-05-06 |
| Math-VR | Multimodal | 4 | 60.5 | 2026-05-27 |
| MMAU | Multimodal | 9 | 67.39 | 2026-05-06 |
| Vibe-Eval | Multimodal | 3 | 0.65 | 2026-05-06 |
| Video SimpleQA | Multimodal | 3 | 57 | 2026-05-06 |
| Visual-Language Understanding | Multimodal | 15 | 46.97 | 2026-05-06 |
| VTB | Multimodal | 13 | 4.69 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 34 | 9.0 | 2026-05-27 |
| GPQA Diamond | Reasoning | 111 | 79% | 2026-05-11 |
| GPQA Diamond | Reasoning | 220 | 68.3% | 2026-05-11 |
| Humanity's Last Exam (Text Only) | Reasoning | 20 | 12.58 | 2026-05-06 |
| CAIS Risk Index | Safety | 31 | 60.1 | 2026-05-27 |
| InvisibleBench | Safety | 10 | 0.11 | 2026-05-06 |
| LiveSecBench | Safety | 23 | 42.38 | 2026-05-27 |
| CritPt | Science | 69 | 1.4% | 2026-05-11 |
| CritPt | Science | 79 | 1.1% | 2026-05-11 |
| AudioMC | Speech | 2 | 40.04 | 2026-05-07 |
| AudioMC | Speech | 6 | 26.11 | 2026-05-07 |
| AudioMC - Text Output | Speech | 2 | 40.04 | 2026-05-06 |
| AudioMC - Text Output | Speech | 4 | 26.11 | 2026-05-06 |
| Structured Output Benchmark | Structured Output | 8 | 86 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 17 | 49.6 | 2026-05-27 |
No matching rows.