Meanwhile, a strong showing in Scale AI’s Audio MultiChallenge means the new Gemini model is more able to cope with hesitation and interruptions in the audio input. Although it outpaces other real-time audio models, Gemini 3.1 Flash Live only manages 36.1 percent in this test. Audio models that are not designed to operate conversationally can reach scores over 50 percent in the MultiChallenge.
The upshot is that Gemini 3.1 Flash Live should sound more like a person, to the point that Google felt it was time to integrate AI flags. The outputs from this model will have SynthID watermarks, which are not perceptible to human listeners. However, they can be detected if someone were to try to pass off Gemini AI speech as the real deal.
Google has partnered with companies like Home Depot, Verizon, and others to test the model. They all have glowing reports in the blog post on how well 3.1 Flash Live can mimic human speech. So the next AI assistant you encounter on a phone call might sound much more realistic. Maybe you’ll even think you’re talking to a person, and SynthID can’t help with that.
Developers can now access the model in AI Studio, the Gemini API, and Gemini Enterprise for Customer Experience. The latter is essentially a toolkit for agentic shopping. Gemini 3.1 Flash Live will be seen most prominently in Gemini Live and Search Live (a feature of AI Mode). The new conversational AI is rolling out in those products starting today.
Leave a Reply