Google Announces Gemini 3.1 Flash-Lite

Google has announced the Gemini 3.1 Flash-Lite model, which it claims is the fastest and most cost-efficient model in the Gemini 3 Family. The company says the model is built for ‘high volume developer workloads at scale’.

Starting March 3, the model is rolling out in preview to developers via the Gemini API and Vertex AI.

As per the benchmark results shared by Google, it offers better performance than other competing models in its class, while also offering competitive pricing.

Gemini 3.1 Flash-Lite stands out with a very low input cost of $0.25 per million tokens and an output cost of $1.50 per million tokens, which is cheaper on output than Claude 4.5 Haiku ($5.00/M) and GPT-5 mini ($2.00/M) while slightly higher on input than Grok 4.1 Fast ($0.20/M).

Gemini 3.1 Flash-Lite’s output cost is also significantly lower than Gemini 2.5 Flash ($2.50/M) despite delivering higher performance, and its input cost compares favourably with its predecessors.

Gemini 3.1 Flash-Lite outperforms competing models across numerous benchmark evaluations spanning scientific reasoning, multimodal understanding, multilingual tasks, long-context processing, and real-world code generation.

On Humanity’s Last Exam (HLE), which tests advanced academic reasoning across text and multimodal inputs without tools, Gemini 3.1 Flash-Lite scored 16.0%. This exceeds Claude 4.5 Haiku (9.7%), but narrowly trails GPT-5 mini (16.7%) and Grok 4.1 Fast (17.6%), indicating competitive high-end reasoning performance.

On LiveCodeBench, which evaluates real-world code generation performance, Gemini 3.1 Flash scored 72.0%, outperforming Claude 4.5 Haiku (53.2%), while trailing GPT-5 mini (80.4%) and Grok 4.1 Fast (76.5%), placing it in the upper tier but not at the top of the coding benchmark.

In terms of output speed, Gemini 3.1 Flash delivers 363 tokens per second, making it one of the fastest models in the group.

It significantly outpaces GPT-5 mini (71 tokens/s), Claude 4.5 Haiku (108 tokens/s), Grok 4.1 Fast (145 tokens/s), and Gemini 2.5 Flash (249 tokens/s). Only Gemini 2.5 Flash-Lite is marginally faster at 366 tokens/s.

“Beyond its raw performance, Gemini 3.1 Flash-Lite comes standard with thinking levels in AI Studio and Vertex AI, giving developers the control and flexibility to select how much the model ‘thinks’ for a task, which is critical for managing high-frequency workloads,” said Google in the announcement.

The company added that the model can be deployed for tasks that require large-scale performance where cost is a priority, such as high-volume translation and content moderation.

“And it can also handle more complex workloads where more in-depth reasoning is needed, like generating user interfaces and dashboards, creating simulations or following instructions.”

ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit

Join Our Core Community

The Procedural Friction Eating Relationship Banking — and How AI Can End It

De-Risking the Crypto Portfolio: How AI Offers CFOs Control in a 24/7 Market

6 Enterprise Tests to Expose Hidden AI Compliance Risks Across Borders

Forward-Looking Technical Debt: The Hidden Cost of AI Hesitation

Why AI Governance Is Becoming a Board-Level Issue for Multinationals

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

Inside IBM’s 11 Billion Dollar Bet: What the Confluent Deal Reveals About AI’s Investment Paradox

“Synthetic Data Is Not the Ground Truth” — SandboxAQ’s VP of Engineering on Simulation’s Power and Limits

Data as the New Diagnostic: How Ahead Health is Turning Algorithms Into Preventive Care

Why Data Leaders Are Wary of a Synthetic Future

SK Telecom, Supermicro & Schneider Electric Partner to Build Solutions for AI Data Centres

Meta Launches Applied AI Engineering Unit to Further Superintelligence Efforts