Google Unveils Gemini 3.1 Flash-Lite: Impressive Benchmarks and Major Price Drop

image of Google's gemini logo with white background

Google Launches New AI Model Focused on Speed and Cost

Google introduced the Gemini 3.1 Flash-Lite on March 3, 2026 as the fastest and most cost-efficient member of the Gemini 3 series, available in preview via Gemini API and Vertex AI for developers. The company announced on its official blog that the model is designed to handle high-volume tasks with low latency, while maintaining adjustable reasoning capabilities as needed.

Low Price and High Performance

The new model arrives with significantly lower prices than the main models in the series:

  • $0.25 per 1 million input tokens
  • $1.50 per 1 million output tokens

According to benchmarks cited by Google, the 3.1 Flash-Lite delivers 2.5× faster speed to the first token and 45% faster output speed compared to the Gemini 2.5 Flash, being geared towards quick responses and intensive workloads.

Benchmarks and Technical Quality

The model also excelled in performance evaluations:

  • Elo Score: 1432 on Arena.ai
  • 86.9% on the GPQA Diamond benchmark
  • 76.8% on MMMU Pro

These figures show that even with a focus on efficiency, the Flash-Lite can compete with larger versions in reasoning and multimodal understanding tasks.

Unlike previous variants, it introduces “Thinking Levels”, a system that allows developers to dynamically adjust the model's reasoning level, from lighter to more complex processing without changing the model.

Practical Applications

The Gemini 3.1 Flash-Lite is designed for scenarios requiring high volume and fast response, such as:

  • Large-scale translation
  • Automated content moderation
  • Real-time data extraction and classification
  • Generation of interfaces or dashboards
  • Creation of simulations or automated workflows

The reasoning flexibility enables teams to adjust processing cost and depth according to task requirements, reducing latency in sensitive applications.

Initial Reception and Next Steps

Some companies are already testing the new model in production, reporting that it can follow complex inputs with accuracy comparable to higher-level models, according to Google's official blog.

In the coming months, the focus is on expanding availability and usage indicators in real-world cases, as well as observing how the developer community will adjust workflows to leverage the balance between cost and speed offered by Flash-Lite.

Share

This content was created and reviewed by our team (iatoskill.com), if you find any issues, please reach out to us

Was this content helpful?
Learn

More News

View All