Gemini 3 Flash Preview
Gemini / Google
121scores
94benchmarks
$0.5 / $3 per 1M tokenscost in/out
Metadata
Gemini Closed/API
Aliases: gemini-3-flash-preview, gemini-3-flash-preview-20251217, google-gemini-3-flash-preview, google-gemini-3-flash-preview-20251217, google/gemini-3-flash-preview, google/gemini-3-flash-preview-20251217
| Benchmark | Category | Rank | Score | Sampled |
|---|---|---|---|---|
| APEX-Agents | Agentic | 11 | 39.50 | 2026-05-06 |
| APEX-Agents-AA | Agentic | 7 | 27.7% | 2026-05-11 |
| ARC-AGI-1 | Agentic | 31 | 84.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 56 | 57.67 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 95 | 29 | 2026-05-05 |
| ARC-AGI-1 | Agentic | 107 | 21.50 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 33 | 33.61 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 48 | 12.78 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 79 | 3.33 | 2026-05-05 |
| ARC-AGI-2 | Agentic | 114 | 1.25 | 2026-05-05 |
| AutoBench | Agentic | 13 | 2.98 | 2026-05-06 |
| EnterpriseOps-Gym | Agentic | 5 | 31.7% | 2026-05-05 |
| Gert Labs Rankings | Agentic | 18 | 0.53 | 2026-05-11 |
| MCP Atlas | Agentic | 11 | 62 | 2026-05-06 |
| PinchBench | Agentic | 17 | 0.87 | 2026-05-06 |
| Poker Agent | Agentic | 3 | 1100.213% | 2025-12-23 |
| t2-bench | Agentic | 2 | 0.90 | 2026-05-06 |
| Tau2-Bench Telecom | Agentic | 106 | 80.4% | 2026-05-11 |
| Tau2-Bench Telecom | Agentic | 197 | 43.3% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 42 | 38.6% | 2026-05-11 |
| Terminal-Bench Hard | Agentic | 84 | 31.8% | 2026-05-11 |
| Toolathlon | Agentic | 5 | 0.49 | 2026-05-06 |
| Vending-Bench 2 | Agentic | 19 | 3634.72 | 2026-05-28 |
| VitaBench | Agentic | 1 | 32.50 | 2026-05-06 |
| OpenUGI | Alignment | 120 | 49.47 | 2026-05-06 |
| OpenUGI | Alignment | 129 | 48.87 | 2026-05-06 |
| OpenUGI | Alignment | 142 | 48.30 | 2026-05-06 |
| ALE-Bench | Coding | 6 | 1367.20 | 2026-05-06 |
| Arena AI Code | Coding | 22 | 1437 | 2026-05-06 |
| Arena AI Code | Coding | 36 | 1389 | 2026-05-06 |
| DeepSWE | Coding | 13 | 5.16 | 2026-05-26 |
| IOI | Coding | 6 | 39.084% | 2026-05-26 |
| LiveCodeBench | Coding | 14 | 85.591% | 2026-05-28 |
| LMArena WebDev Arena | Coding | 22 | 1437.04 | 2026-05-06 |
| SciCode | Coding | 15 | 50.6% | 2026-05-11 |
| SciCode | Coding | 20 | 49.9% | 2026-05-11 |
| SWE Atlas - Codebase QnA | Coding | 9 | 8.20 | 2026-05-06 |
| SWE Atlas - Refactoring | Coding | 11 | 10 | 2026-05-06 |
| SWE Atlas - Test Writing | Coding | 2 | 30.30 | 2026-05-06 |
| SWE-bench Verified | Coding | 16 | 75% | 2026-05-28 |
| Terminal-Bench 2.0 | Coding | 19 | 51.685% | 2026-05-28 |
| Terminal-Bench 2.1 | Coding | 8 | 53.933% | 2026-05-28 |
| Vibe Code Bench v1.1 | Coding | 24 | 20.204% | 2026-05-28 |
| VibeCodingBench | Coding | 15 | 83.44 | 2026-05-06 |
| SecCodeBench | Cybersecurity | 10 | 58.66% | 2026-05-28 |
| OmniDocBench 1.5 | Document Understanding | 10 | 0.12 | 2026-05-06 |
| Arena AI Document | Document AI | 20 | 1421 | 2026-05-06 |
| GSMA Open Telco Leaderboard | Domain | 7 | 70.41 | 2026-05-06 |
| SAGE | Education | 5 | 51.849% | 2026-05-28 |
| From Perception to Action | Embodied AI | 6 | 11.9% | 2026-05-28 |
| AA-Omniscience | Factuality | 6 | 11.57 | 2026-05-11 |
| Vectara HHEM Hallucination Leaderboard | Factuality | 82 | 86.50 | 2026-05-06 |
| CorpFin v2 | Finance | 10 | 66.434% | 2026-05-28 |
| Finance Agent v1.1 | Finance | 31 | 47.598% | 2026-05-04 |
| Finance Agent v2 | Finance | 12 | 42.551% | 2026-05-28 |
| MortgageTax | Finance | 7 | 68.72% | 2026-05-28 |
| Rogo Big Finance Bench | Finance | 7 | 43% rubric / 26% final | 2026-05-28 |
| TaxEval v2 | Finance | 28 | 73.876% | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 4 | 1409.13 Elo / 13 games | 2026-05-28 |
| InfiniteBM Heads-Up No-Limit Hold'em | Game | 31 | 978.72 Elo / 89 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 1 | 1566.83 Elo / 27 games | 2026-05-28 |
| InfiniteBM Liar's Dice | Game | 5 | 1376.7 Elo / 92 games | 2026-05-28 |
| MageBench Season 1 | Game | 11 | 1622 rating / 10 games | 2026-05-28 |
| ALL Bench LLM | General Knowledge | 6 | 50.11 | 2026-05-06 |
| BenchLM | General Knowledge | 40 | 65 | 2026-05-06 |
| LMArena Text Arena | Generalization | 13 | 1466.61 | 2026-05-06 |
| LMArena Text Arena | Generalization | 25 | 1448.34 | 2026-05-06 |
| MedCode | Healthcare | 2 | 55.92% | 2026-05-28 |
| MedQA | Healthcare | 11 | 95.808% | 2026-04-16 |
| MedScribe | Healthcare | 50 | 69.917% | 2026-05-28 |
| PlaceboBench | Healthcare | 5 | 44.9275 | 2026-05-27 |
| Artificial Analysis Intelligence Index | Intelligence | 39 | 46.43 | 2026-05-11 |
| Artificial Analysis Intelligence Index | Intelligence | 109 | 35.05 | 2026-05-11 |
| GPQA Diamond | Intelligence | 17 | 87.879% | 2026-05-28 |
| Humanity's Last Exam | Intelligence | 15 | 34.7% | 2026-05-11 |
| Humanity's Last Exam | Intelligence | 101 | 14.1% | 2026-05-11 |
| LiveBench | Intelligence | 19 | 73.05 | 2026-05-05 |
| MMLU Pro | Intelligence | 8 | 88.592% | 2026-05-28 |
| MMLU-Pro | Intelligence | 4 | 89% | 2026-05-11 |
| MMLU-Pro | Intelligence | 6 | 88.2% | 2026-05-11 |
| MMMU Pro | Intelligence | 4 | 87.63% | 2026-05-28 |
| Vals Index | Intelligence | 12 | 49.314% | 2026-05-28 |
| Vals Multimodal Index | Intelligence | 9 | 51.975% | 2026-05-28 |
| CaseLaw v2 | Legal | 32 | 55.842% | 2026-05-04 |
| LegalBench | Legal | 3 | 86.858% | 2026-05-28 |
| MRCR v2 (8-needle) | Long Context | 8 | 0.22 | 2026-05-06 |
| AIME | Math | 9 | 95.625% | 2026-04-16 |
| AIME 2025 | Math | 3 | 97% | 2026-05-11 |
| AIME 2025 | Math | 130 | 55.7% | 2026-05-11 |
| MGSM | Math | 10 | 93.309% | 2026-01-09 |
| ProofBench | Math | 19 | 15% | 2026-05-28 |
| LiveMedBench | Medical | 8 | 0.2167 | 2026-05-27 |
| Medical Chronology LLM Benchmark | Medical | 4 | 0.91 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 5 | 51.49 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 4 | 16.76 | 2026-05-06 |
| ALL Bench Multimodal | Multimodal | 9 | 8.04 | 2026-05-06 |
| Blueprint-Bench 2 | Multimodal | 11 | 0.534 +/- 0.019 | 2026-05-28 |
| CharXiv-R | Multimodal | 9 | 0.80 | 2026-05-06 |
| Design Arena | Multimodal | 28 | 1249 | 2026-05-06 |
| IDP Leaderboard | Multimodal | 4 | 81.95 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 9 | 1282.62 | 2026-05-06 |
| LMArena Vision Arena | Multimodal | 16 | 1264.39 | 2026-05-06 |
| VideoMMMU | Multimodal | 2 | 0.87 | 2026-05-06 |
| ARC-AGI v2 | Reasoning | 10 | 0.34 | 2026-05-06 |
| CAIS Text Capabilities Index | Reasoning | 9 | 35.6 | 2026-05-27 |
| Context Arena | Reasoning | 27 | 46.79 | 2026-05-06 |
| Context Arena | Reasoning | 29 | 46.24 | 2026-05-06 |
| Context Arena | Reasoning | 33 | 39.58 | 2026-05-06 |
| Context Arena | Reasoning | 56 | 25.60 | 2026-05-06 |
| Global PIQA | Reasoning | 2 | 0.93 | 2026-05-06 |
| GPQA Diamond | Reasoning | 16 | 89.8% | 2026-05-11 |
| GPQA Diamond | Reasoning | 92 | 81.2% | 2026-05-11 |
| CAIS Risk Index | Safety | 29 | 59.4 | 2026-05-27 |
| InvisibleBench | Safety | 9 | 0.09 | 2026-05-06 |
| CritPt | Science | 20 | 8.6% | 2026-05-11 |
| CritPt | Science | 70 | 1.4% | 2026-05-11 |
| SciPredict | Science | 2 | 22.22 | 2026-05-06 |
| ProgramBench | Software Engineering | 6 | 0% | 2026-05-05 |
| Structured Output Benchmark | Structured Output | 22 | 83.30 | 2026-05-06 |
| CAIS Vision Capabilities Index | Vision | 2 | 64.9 | 2026-05-27 |
| Roboflow Vision Evals - Visual Understanding | Vision | 3 | 79.1% | 2026-05-22 |
No matching rows.