Google has unveiled Gemini 3.5 Live Translate, a cutting-edge audio model built for instant speech-to-speech translation. This means spoken words go in, and translated spoken words automatically come out. The system identifies more than 70 languages on its own and produces translated speech that mirrors the original speaker’s rhythm, tone, and pitch. Unlike traditional turn-based systems that wait for someone to finish speaking, Gemini 3.5 Live Translate delivers translated audio in real time. It carefully balances waiting for enough context for accuracy versus responding instantly. Greater context leads to better translation quality, while faster output keeps the translation closely aligned with the speaker’s words. As a result, the translated audio typically lags just a few seconds behind the original speaker during a session.
Gemini 3.5 Live Translate
Gemini 3.5 Live Translate is a dedicated audio model named gemini-3.5-live-translate-preview, not a full chatbot assistant. It analyzes audio in real time as the stream continues, rather than waiting for complete sentences. It handles multi-language conversations without any manual setup. Its strong noise tolerance enables apps to work smoothly in noisy, real-world settings.
The model is being released on three platforms. Developers can access it through public preview in the Gemini Live API or Google AI Studio. Businesses get access through private preview within Google Meet starting this month. Regular users can start using it via the Google Translate app on both Android and iOS devices.
How Continuous Streaming Works
The design choice is critical for real-time applications. A conversational Live agent operates on turns, depending on pauses, understanding intent, and managing interruptions. In contrast, Live Translation processes audio as a continuous stream. It translates while the speaker is talking, without waiting for turn endings.
To maintain strict real-time performance, the translation mode accepts only audio input—text input is not available. Additionally, the model skips tools and system instructions in this mode. This keeps it streamlined as a focused translation engine rather than a general-purpose assistant.
Building With the Live API
Developers set up translation within the Live API session. A translationConfig block is added inside generationConfig. The targetLanguageCode field uses BCP-47 language codes, such as "pl" for Polish or "es" for Spanish. BCP-47 is the standard format used for labels like en or pt-BR. It defaults to "en". The echoTargetLanguage flag determines whether speech already in the target language should be repeated—true plays it back, false silences it. You can optionally turn on inputAudioTranscription and outputAudioTranscription to get text logs.
Audio requirements are specific. Input must be raw 16-bit PCM audio at 16kHz, mono, little-endian format. Output is raw 16-bit PCM audio at 24kHz, mono, little-endian. PCM is uncompressed digital audio. You send audio in 100ms chunks. For mobile or web apps, ephemeral tokens via the v1alpha endpoint protect your API key.
| Dimension | Live Agent | Live Translation |
|---|---|---|
| Model role | Assistant that listens, reasons, and acts | Interpreter / real-time translator |
| Interaction | Turn-based, with interruption support | Continuous stream processing, no turns |
| Tools | Function calling, Google Search, instructions | Translation only, no tools or instructions |
| Inputs | Text, audio, video, and image | Audio only, optimized for low latency |
| Configuration | Generation, speech, tools, instructions | targetLanguageCode and echoTargetLanguage |
Use Cases
The model is designed for live interpretation in various scenarios. Google highlights uses in multilingual calls, meetings, classrooms, and live broadcasts. Developer platforms simplify the integration of real-time language translation into apps. Companies like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents already utilize the Live API. These services manage the complex real-time media delivery infrastructure, allowing developers to focus on building great user experiences.
Google’s demo app shows features like dubbing and simultaneous translations for multiple languages. Grab is testing the model to help drivers and riders communicate seamlessly during pickups. Grab sees over 10 million voice calls each month. CJ ENM, LiveKit, and others have reported strong results in translation quality, accuracy, and minimal delay.
Impact on Google Meet and Translate
According to Google’s announcement, Google Meet will soon support 3.5 Live Translate for speech translation. Below is a summary of the key improvements for Google Meet.
| Capability | Previous Meet | With 3.5 Live Translate |
|---|---|---|
| Languages | 5 | 70+ |
| Combinations per meeting | Only English paired with others | 2000+ combinations |
| Access | Existing interface | Updated interface for instant access |
The Google Meet update is in private preview for select enterprise Workspace customers this month, with a wider release coming later in the year. In the Google Translate app, Live Translate works with any Bluetooth headphones. The translation preserves the speaker’s vocal tone across 70+ languages. Android also gets a new listening mode—hold your phone to your ear like a phone call, and the translated audio streams privately through the earpiece, so others nearby can’t hear.
Key Takeaways
- Gemini 3.5 Live Translate is Google’s latest audio model for real-time speech-to-speech translation across more than 70 languages.
- It streams continuously instead of using turn-based responses, staying just a few seconds behind the speaker.
- Developers configure it through the Live API using
targetLanguageCodeandechoTargetLanguage; audio-only mode with 16kHz input and 24kHz output. - It is available in public preview via the Gemini Live API, private preview in Google Meet (expanding from 5 to 70+ languages), and the Translate mobile app.
- All generated audio includes a subtle SynthID watermark for identification.
Explore the Model Card and technical documentation. Also, feel free to connect with us on Twitter, and don’t forget to join our 150k+ ML SubReddit and subscribe to our newsletter. Are you on Telegram? You can now join us there too.
Interested in partnering with us to promote your GitHub repo, Hugging Face project, product launch, or webinar? Get in touch with us



