Google DeepMind Releases Gemini Embedding 2 in Public Preview

Share

Google DeepMind has released Gemini Embedding 2, a multimodal embedding model that maps text, images, video, audio and documents into a single embedding space.

Gemini Embedding 2 is accessible through the Gemini API and Vertex AI and can also integrate with tools including LangChain, LlamaIndex, Haystack, Weaviate, Qdrant and ChromaDB, the company said.

The model allows developers to perform retrieval, classification and semantic search across different types of media using a unified representation of data.

“Gemini Embedding 2 maps text, images, videos, audio and documents into a single, unified embedding space, and captures semantic intent across over 100 languages,” the company said in a blog post. “This simplifies complex pipelines and enhances a wide variety of multimodal downstream tasks—from retrieval-augmented generation and semantic search to sentiment analysis and data clustering.”

Embedding models convert data into numerical vectors that help machine learning systems understand semantic relationships between pieces of information. By placing different media types in the same vector space, developers can search or analyse them together.

Gemini Embedding 2 processes multiple modalities and supports interleaved inputs, allowing developers to pass combinations such as image and text in a single request. 

According to the company, the model supports up to 8,192 tokens of text, six images per request, videos up to 120 seconds, native audio inputs, and PDF documents up to six pages.

The model also supports Matryoshka Representation Learning, which allows embeddings to scale down from a default dimension of 3,072 to smaller sizes such as 1,536 or 768, helping developers balance performance and storage costs.

Google said the model shows improvements over earlier embedding systems across text, image and video benchmarks while adding speech capabilities.

Early users include legal technology company Everlaw, which is applying the system to analyse litigation data.

“We chose Gemini embeddings to help legal professionals find critical information during the discovery process in litigation—a highly technical challenge in a high-stakes setting, and one Gemini excels at,” said Max Christoff, Chief Technology Officer at Everlaw.

Christoff said the model improved precision and recall when searching across large datasets and enabled new search capabilities for images and videos in legal records.

ALSO READ: The Playground is Closed: 10 Hard Truths from the Cisco AI Summit

Staff Writer
Staff Writer
The AI & Data Insider team works with a staff of in-house writers and industry experts.

Related

Unpack More