ZOO OWL Posts: Google Unveils Gemini 3.5 Live Translate — Real-Time Speech-to-Speech Translation Across 70+ Languages On Meet, Translate & Live API

Google has unveiled Gemini 3.5 Live Translate, a cutting-edge audio model built for instant speech-to-speech translation. This means spoken words go in, and translated spoken words automatically come out. The system identifies more than 70 languages on its own and produces translated speech that mirrors the original speaker’s rhythm, tone, and pitch. Unlike traditional turn-based systems that wait for someone to finish speaking, Gemini 3.5 Live Translate delivers translated audio in real time. It carefully balances waiting for enough context for accuracy versus responding instantly. Greater context leads to better translation quality, while faster output keeps the translation closely aligned with the speaker’s words. As a result, the translated audio typically lags just a few seconds behind the original speaker during a session.

Gemini 3.5 Live Translate

Gemini 3.5 Live Translate is a dedicated audio model named gemini-3.5-live-translate-preview, not a full chatbot assistant. It analyzes audio in real time as the stream continues, rather than waiting for complete sentences. It handles multi-language conversations without any manual setup. Its strong noise tolerance enables apps to work smoothly in noisy, real-world settings.

The model is being released on three platforms. Developers can access it through public preview in the Gemini Live API or Google AI Studio. Businesses get access through private preview within Google Meet starting this month. Regular users can start using it via the Google Translate app on both Android and iOS devices.

How Continuous Streaming Works

The design choice is critical for real-time applications. A conversational Live agent operates on turns, depending on pauses, understanding intent, and managing interruptions. In contrast, Live Translation processes audio as a continuous stream. It translates while the speaker is talking, without waiting for turn endings.

To maintain strict real-time performance, the translation mode accepts only audio input—text input is not available. Additionally, the model skips tools and system instructions in this mode. This keeps it streamlined as a focused translation engine rather than a general-purpose assistant.

Building With the Live API

Developers set up translation within the Live API session. A translationConfig block is added inside generationConfig. The targetLanguageCode field uses BCP-47 language codes, such as "pl" for Polish or "es" for Spanish. BCP-47 is the standard format used for labels like en or pt-BR. It defaults to "en". The echoTargetLanguage flag determines whether speech already in the target language should be repeated—true plays it back, false silences it. You can optionally turn on inputAudioTranscription and outputAudioTranscription to get text logs.

Audio requirements are specific. Input must be raw 16-bit PCM audio at 16kHz, mono, little-endian format. Output is raw 16-bit PCM audio at 24kHz, mono, little-endian. PCM is uncompressed digital audio. You send audio in 100ms chunks. For mobile or web apps, ephemeral tokens via the v1alpha endpoint protect your API key.

Dimension	Live Agent	Live Translation
Model role	Assistant that listens, reasons, and acts	Interpreter / real-time translator
Interaction	Turn-based, with interruption support	Continuous stream processing, no turns
Tools	Function calling, Google Search, instructions	Translation only, no tools or instructions
Inputs	Text, audio, video, and image	Audio only, optimized for low latency
Configuration	Generation, speech, tools, instructions	`targetLanguageCode` and `echoTargetLanguage`

Use Cases

The model is designed for live interpretation in various scenarios. Google highlights uses in multilingual calls, meetings, classrooms, and live broadcasts. Developer platforms simplify the integration of real-time language translation into apps. Companies like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents already utilize the Live API. These services manage the complex real-time media delivery infrastructure, allowing developers to focus on building great user experiences.

Google’s demo app shows features like dubbing and simultaneous translations for multiple languages. Grab is testing the model to help drivers and riders communicate seamlessly during pickups. Grab sees over 10 million voice calls each month. CJ ENM, LiveKit, and others have reported strong results in translation quality, accuracy, and minimal delay.

Impact on Google Meet and Translate

According to Google’s announcement, Google Meet will soon support 3.5 Live Translate for speech translation. Below is a summary of the key improvements for Google Meet.

Capability	Previous Meet	With 3.5 Live Translate
Languages	5	70+
Combinations per meeting	Only English paired with others	2000+ combinations
Access	Existing interface	Updated interface for instant access

The Google Meet update is in private preview for select enterprise Workspace customers this month, with a wider release coming later in the year. In the Google Translate app, Live Translate works with any Bluetooth headphones. The translation preserves the speaker’s vocal tone across 70+ languages. Android also gets a new listening mode—hold your phone to your ear like a phone call, and the translated audio streams privately through the earpiece, so others nearby can’t hear.

Key Takeaways

Gemini 3.5 Live Translate is Google’s latest audio model for real-time speech-to-speech translation across more than 70 languages.
It streams continuously instead of using turn-based responses, staying just a few seconds behind the speaker.
Developers configure it through the Live API using targetLanguageCode and echoTargetLanguage; audio-only mode with 16kHz input and 24kHz output.
It is available in public preview via the Gemini Live API, private preview in Google Meet (expanding from 5 to 70+ languages), and the Translate mobile app.
All generated audio includes a subtle SynthID watermark for identification.

Explore the Model Card and technical documentation. Also, feel free to connect with us on Twitter, and don’t forget to join our 150k+ ML SubReddit and subscribe to our newsletter. Are you on Telegram? You can now join us there too.

Interested in partnering with us to promote your GitHub repo, Hugging Face project, product launch, or webinar? Get in touch with us

Top Posts

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Visa’s Bold Move: Powering OpenAI’s AI-Driven Payments – Is It Safe?

ZOO OWL Posts: Google Unveils Gemini 3.5 Live Translate — Real-Time Speech-to-Speech Translation Across 70+ Languages on Meet, Translate & Live API

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

OWL’s Guide: 3D Spleen Segmentation with MONAI UNet on CT Volumes

Vision LLMs Double as Powerful PDF Decoders: Making Charts and Diagrams Retrievable for Smarter RAG Systems

Zyphra Unveils Zamba2-VL: A Hybrid Mamba2–Transformer Vision-Language Model Slashing Time-to-First-Token by Nearly 10x

Parse PDFs Locally for RAG Using Docling: Extract Rich Tables Without Cloud Upload

Decoding Schizophrenia: How Saliency Maps Illuminate 3D MRI Decision Pathways

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Visa’s Bold Move: Powering OpenAI’s AI-Driven Payments – Is It Safe?

Anthropic Export Controls Spark Global AI Sovereignty Scramble

Mathematical String Probability: A Human-Powered Solution to the 3Blue1Brown Challenge

Reve 2.0 Review: The Best AI Image Generator for Layout Control

Army Data Center Initiatives Face Potential Setback Under House NDAA Clause

I tested dozens of Bluetooth trackers, but this one shocked me with its AirTag-crushing battery life

Trending

Gate Launches RLUSD with Four Trading Pairs and a User Rewards Program

Senate Democrats Push to Overturn Key Ruling on Civil Service Job Protections

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

ZOO OWL Posts: Google Unveils Gemini 3.5 Live Translate — Real-Time Speech-to-Speech Translation Across 70+ Languages on Meet, Translate & Live API

Gemini 3.5 Live Translate

How Continuous Streaming Works

Building With the Live API

Use Cases

Impact on Google Meet and Translate

Key Takeaways

Related Posts