Google releases its first native multimodal embedding model, Gemini Embedding 2

Wallstreetcn
2026.03.10 23:36
portai
I'm PortAI, I can summarize articles.

Google DeepMind launched its first native multimodal embedding model, Gemini Embedding 2, on March 10, which can unify text, images, videos, audio, and documents into a single embedding space. The model supports over 100 languages and introduces native voice embedding capabilities for the first time, eliminating the need for an intermediate step of converting speech to text. It uses MRL technology to support flexible compression of vector dimensions, balancing performance and storage costs