Google Launches New Multimodal Model Gemma 4 12B

Google has introduced Gemma 4 12B, a new open-weight multimodal model designed to run locally on consumer hardware while supporting text, image and audio inputs through a single unified architecture.

The model sits between Google’s smaller E4B model and its larger 26B Mixture-of-Experts (MoE) system, offering what the company describes as near-26B benchmark performance at less than half the memory footprint.

Google said Gemma 4 models have now surpassed 150 million downloads across the developer community.

Gemma 4 12B is the first mid-sized Gemma model to include native audio capabilities. According to Google, the model can run locally on laptops equipped with 16GB of VRAM or unified memory, targeting multimodal reasoning tasks, agentic workflows and offline AI applications.

A key technical change is the model’s encoder-free multimodal architecture. Most multimodal systems use dedicated vision and audio encoders that convert inputs into embeddings before passing them to the language model. Gemma 4 12B removes those components and processes visual and audio inputs directly within the model backbone.

For vision, Google replaced the traditional vision encoder with a lightweight embedding module consisting of a matrix multiplication layer, positional embeddings and normalisation steps. For audio, the company eliminated the audio encoder entirely, projecting raw audio signals into the same token space used by text.

“What makes Gemma 4 12B stand out is its streamlined approach to processing visual and audio inputs,” Google said. The company said removing separate encoders reduces memory requirements and latency while simplifying deployment on local hardware.

The model also includes Multi-Token Prediction (MTP) drafters, which generate multiple future tokens simultaneously to reduce inference latency. Google is releasing Gemma 4 12B under the Apache 2.0 licence and making it available through tools including LM Studio, Ollama, Hugging Face Transformers, llama.cpp, MLX, SGLang and vLLM.

Alongside the model release, Google is launching a new Gemma Skills Repository that contains reusable agent components for developers building applications on top of Gemma models. Production deployment options include Google Cloud’s Gemini Enterprise Agent Platform, Model Garden, Cloud Run and GKE.

The launch extends Google’s recent push into smaller and more efficient AI models.

In recent months, the company introduced Gemini 3.1 Flash-Lite, a model aimed at high-volume developer workloads with lower latency and cost, while continuing to expand the Gemini Flash family for production inference.

Gemma 4 12B also follows the broader Gemma 4 rollout, which introduced E4B and 26B variants focused on multimodal reasoning and agentic workflows, as Google increases its investment in models that can run directly on laptops, phones and other edge devices.

ALSO READ: Alteryx Inspire 2026: Three Questions Every Data Leader Should Take to Orlando

Join Our Core Community

CEOs, AI and the New Burden of Knowing Enough

Why Data Sovereignty Is Becoming an Enterprise AI Control Problem

This Startup Went from a Team of 20 to 6. Yet, Humans are their Most Valued Asset.

From Generic Models to Living Twins: A Practitioner’s Guide to ML in Design Workflows

Designing AI‑Ready Public Infrastructure: Global Lessons from India’s Aadhaar Builder

Banks Are Drowning in Data and Starving for Insight

Unstructured Data, Deterministic Answers

Data Layer Precedes Compute, GPU Capacity in Sovereign AI

Why Data Reliability Now Governs Scaling GenAI

Cloud 3.0 and Data Sovereignty: Why Workload Placement Is Now a Strategic Decision

OpenAI Launches ChatGPT Work Powered by GPT-5.6 for Enterprise Workflows

MiniMax Announces New $2 Bn Funding

Meta Launches Muse Spark 1.1 Challenges GPT-5.5 & Opus 4.8

Father of Reinforcement Learning Richard Sutton Launches New AI Startup

SpaceXAI Launches Grok 4.5

Google Launches New Multimodal Model Gemma 4 12B

Google is also launching a new Gemma Skills Repository that contains reusable agent components for developers building applications on top of Gemma models.

OpenAI Launches ChatGPT Work Powered by GPT-5.6 for Enterprise Workflows

MiniMax Announces New $2 Bn Funding

Unpack More

Lovable Signs Multi-Year Google Cloud Deal

ByteDance’s CapCut Joins Google Gemini for Conversational AI Image Editing

Google Launches Antigravity 2.0

Blackstone, Google Launch TPU Cloud Venture With $5 Bn Commitment

Why Data Reliability Now Governs Scaling GenAI

Middle East: The Sovereign AI Testbed US, EU and Asia Can Learn From

NVIDIA’s VP of Solutions Architecture on What It Actually Takes to Build a Sovereign AI Factory