Google Announces Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite’s output cost is also significantly lower than Gemini 2.5 Flash ($2.50/M) despite delivering higher performance.

Share

Google has announced the Gemini 3.1 Flash-Lite model, which it claims is the fastest and most cost-efficient model in the Gemini 3 Family. The company says the model is built for ‘high volume developer workloads at scale’. 

Starting March 3, the model is rolling out in preview to developers via the Gemini API and Vertex AI. 

As per the benchmark results shared by Google, it offers better performance than other competing models in its class, while also offering competitive pricing. 

Gemini 3.1 Flash-Lite stands out with a very low input cost of $0.25 per million tokens and an output cost of $1.50 per million tokens, which is cheaper on output than Claude 4.5 Haiku ($5.00/M) and GPT-5 mini ($2.00/M) while slightly higher on input than Grok 4.1 Fast ($0.20/M). 

Gemini 3.1 Flash-Lite’s output cost is also significantly lower than Gemini 2.5 Flash ($2.50/M) despite delivering higher performance, and its input cost compares favourably with its predecessors. 

Gemini 3.1 Flash-Lite outperforms competing models across numerous benchmark evaluations spanning scientific reasoning, multimodal understanding, multilingual tasks, long-context processing, and real-world code generation.

On Humanity’s Last Exam (HLE), which tests advanced academic reasoning across text and multimodal inputs without tools, Gemini 3.1 Flash-Lite scored 16.0%. This exceeds Claude 4.5 Haiku (9.7%), but narrowly trails GPT-5 mini (16.7%) and Grok 4.1 Fast (17.6%), indicating competitive high-end reasoning performance. 

On LiveCodeBench, which evaluates real-world code generation performance, Gemini 3.1 Flash scored 72.0%, outperforming Claude 4.5 Haiku (53.2%), while trailing GPT-5 mini (80.4%) and Grok 4.1 Fast (76.5%), placing it in the upper tier but not at the top of the coding benchmark.

In terms of output speed, Gemini 3.1 Flash delivers 363 tokens per second, making it one of the fastest models in the group. 

It significantly outpaces GPT-5 mini (71 tokens/s), Claude 4.5 Haiku (108 tokens/s), Grok 4.1 Fast (145 tokens/s), and Gemini 2.5 Flash (249 tokens/s). Only Gemini 2.5 Flash-Lite is marginally faster at 366 tokens/s.

“Beyond its raw performance, Gemini 3.1 Flash-Lite comes standard with thinking levels in AI Studio and Vertex AI, giving developers the control and flexibility to select how much the model ‘thinks’ for a task, which is critical for managing high-frequency workloads,” said Google in the announcement. 

The company added that the model can be deployed for tasks that require large-scale performance where cost is a priority, such as high-volume translation and content moderation. 

“And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions.”

ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More