Google launches Gemini 3.5 Flash: outperforms Pro tier on coding, 40% cheaper, 4x faster
Google released Gemini 3.5 Flash on May 19, 2026, at Google I/O, establishing it as the default model across the Gemini app (900M MAU), Google Search AI Mode (1B+ MAU), Antigravity 2.0, and Gemini API. The Flash-tier release inverts Google's historical hierarchy: 3.5 Flash outperforms the flagship Gemini 3.1 Pro on coding and agentic benchmarks—Terminal-Bench 2.1: 76.2% vs. 70.3%, MCP Atlas: 83.6%, GDPval-AA: 1656 Elo—while delivering 4x faster output token generation and pricing 40% lower at $1.50/$9.00 per million input/output tokens (vs. 3.1 Pro's $2.50/$15). The model supports 1M context and is the strongest agentic model Google has shipped to date.
The architectural move signals a shift in frontier AI strategy: rather than lead with Pro capability and let Flash trail, Google optimized the Flash family for speed and cost while maintaining frontier-grade reasoning. Gemini 3.5 Flash beats GPT-5.5 on MCP Atlas (tool-use reliability) and matches it on coding speed. It regresses slightly on pure reasoning (Humanity's Last Exam, ARC-AGI-2) compared to 3.1 Pro, reflecting a design choice to prioritize real-world agentic tasks over abstract reasoning. Gemini 3.5 Pro is still in internal testing and rolling out "next month" (targeting June 2026). Google disclosed that 3.2 quadrillion tokens per month flow through its systems, up 7x year-over-year, and Antigravity 2.0 runs 3.5 Flash at 12x the speed of the public API through local optimization.
Pricing and availability are aggressive: $1.50 input is the lowest price for any frontier model, making high-volume agentic pipelines materially cheaper. Cached input tokens cost $0.15 per million (90% discount). For teams running document extraction, code generation, or agent-based workflows, the unit economics vs. Claude Opus 4.7 ($5/$25) or GPT-5.5 ($4-8/$12-24) are now decisively in Google's favor at scale. Google also introduced Gemini Spark (a persistent personal agent in the Gemini app, AI Ultra subscribers only, $100/month), and announced Gemini Omni, a video-generation model starting with image and audio inputs.
For practitioners, the 3.5 Flash launch reshuffles the cost-per-inference calculus: any agentic or coding workload previously locked to a "better costs money" trade-off can now evaluate Google first without quality sacrifice. The default-model position (500M+ search daily users) means developers building against Gemini API will benchmark against a model that already reaches billions via search; that distribution advantage compounds adoption. Watch for model selection logic in LLM routers and orchestration layers to shift toward latency-sensitive, cost-per-token-efficient models. Enterprises should audit which workloads are currently over-provisioned on flagship models and can drop to 3.5 Flash without quality loss.
Sources
- Primary source
- blog.google
“Gemini 3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions, at the speeds you have come to expect from the Flash series. It's our strongest agentic and coding model yet, outperforming Gemini 3.1 Pro on challenging coding and agentic benchmarks”
- macrumors.com
“Gemini 3.5 Flash is available for everyone today across Google's products and APIs. Gemini 3.5 Pro - Google is testing Gemini 3.5 Pro internally, and it's coming next month”
- pasqualepillitteri.it
“3.5 Flash outperforms Gemini 3.1 Pro (released just three months earlier, in February 2026) across nearly the entire benchmark suite that truly matters today for agentic workloads at 40% less cost and 4x faster”