Google Officially Announces Gemini 3.5 Flash at Google I/O 2026

A close-up amateur photo of a tidy workspace with two monitors displaying real-time performance metrics and a tablet streaming the Google I/O keynote.

On Tuesday, May 19, 2026 (UTC), tech giant Google officially announced the global launch of its new high-speed artificial intelligence model, Gemini 3.5 Flash, during the opening conference of Google I/O 2026 held in Mountain View. This innovation was presented as the company's main offering for developers who demand high performance and extremely low response times in production. The new model is available starting today for testing and commercial integration on the Google AI Studio and Vertex AI platforms.

During the event, Google's CEO, Sundar Pichai, highlighted the strategic positioning of the model:

“With Gemini 3.5 Flash, we are delivering exceptional processing speed without sacrificing the deep intelligence developers expect from our family of models. It is our definitive answer to the need for real-time applications with massive scale and extremely competitive cloud costs.”

Performance Benchmarks and Sub-200ms Latency

Unlike previous approaches that focused purely on extreme cost reduction per token at the expense of cognitive quality, the Google DeepMind division refined the architecture of Gemini 3.5 Flash based on new knowledge distillation algorithms. Under the supervision of Demis Hassabis, co-founder of Google DeepMind, the model achieved processing latency consistently below 200 milliseconds in most complex text and computer vision requests. This positions the model as a highly competitive solution against fast market alternatives like Claude 3.5 Haiku.

Beyond raw response speed, the innovation offers one of the largest operational capacities in its category, maintaining a 1 million token context window. This capability allows the system to process large volumes of documents, entire programming codes, and even hours of video at once, returning structured responses almost instantaneously.

Pricing Structure and Market Availability

The official announcement confirmed the rates for the standard service level. The operating cost was set at $1.50 per million input tokens and $0.60 per million output tokens, positioning it as a premium and robust option for deployment in critical corporate systems, automated financial analysis, and real-time dynamic data processing. The Google Cloud infrastructure is already fully supporting the new model in all global regions starting from this keynote.

Share

This content was created and reviewed by our team (iatoskill.com), if you find any issues, please reach out to us

Was this content helpful?
Learn

More News

View All