- Google has announced “Gemini 3.1 Flash TTS (Text-To-Speech),” the latest text-to-speech AI model with improved controllability, expressiveness, and quality.
- In addition to achieving higher overall voice quality and better generation capabilities, this model features the low-cost efficiency characteristic of the “Gemini 3.1 Flash” family.
- A new embedded audio element called “Audio Tags” has also been introduced.
On Wednesday, April 15, 2026, Google announced the launch of “Gemini 3.1 Flash TTS (Text-To-Speech),” its latest text-to-speech AI model featuring enhanced controllability, expressiveness, and quality.
“Gemini 3.1 Flash TTS” is the latest text-to-speech AI model that not only achieves higher overall voice quality and superior audio generation but also maintains the cost-efficiency inherent to the “Gemini 3.1 Flash” family. It supports multi-speaker conversations, covers over 70 languages, and allows for fine-grained control using natural language.
Furthermore, “Gemini 3.1 Flash TTS” introduces new embedded audio elements called “Audio Tags.” These allow users to intuitively control voice styles, pacing, and speech delivery methods.
“Gemini 3.1 Flash TTS” is being rolled out to developers via platforms such as the multimodal generative AI development platform “Google AI Studio” and the machine learning development platform “Vertex AI.” It will also be integrated into “Google Vids,” the video editing service within the Google Workspace suite.
Source: Google





コメントを残す