• Google has published a new research paper detailing a text-to-speech system known as Tacotron 2
  • The system, which is powered by neural networks, includes an AI that can read text aloud in a near human-like manner
  • The results have significant implications for the Google Assistant and the Google Home range of products


You might have watched a movie like The Terminator or I, Robot and considered that the artificial intelligence potential it portrays is a far cry from our current technologies (there’s no real fear of bots powered by Samsung Bixby overtaking the planet, that’s for sure). After investigating a recently published Google research paper (via Quartz), it looks like we might be closer to this reality than you might think.

The paper, titled “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions,” highlights a new Google text-to-speech system called Tacotron 2, which is capable of a near-human level of AI voice reproduction.

To achieve this, Tacotron 2 uses a pair of neural networks: one to create a visual representation of specific audio frequencies and a second (called “WaveNet”) to recreate this visual data as sound. Google launched a website alongside the paper to show-off what this tech could lead to in practice; there, Google provides examples of how Tacotron 2 handles phrase semantics (like distinguishing between the noun and verb of “present”), intonation and difficult words that might trip some of us humans up like “otolaryngology.”

Editor's Pick

In the last section, Google provides side-by-side examples of a human voice alongside the AI created one — with, to my ear, outstanding results (in most cases I struggle to identify the computer-generated voice).

While not explicitly stated in the research, this voice tech may be just a part of Google’s much broader mission of making its digital assistant, Google Assistant, more conversational. Google Assistant is the AI behind the Google Home products that the company is currently pushing, and it’s an area where this technology would naturally fit. Google Assistant is certainly more efficient than it has ever been, but this research indicates that it could soon be even more human too.

Of course, there is still a vast gap between an AI that can read aloud like a real person, and an AI that could converse like a real person — where the nuance of personality and the unpredictability of conversations play critical roles. But with developments like this, AI’s such as the one Scarlett Johanson portrays in the movie Her might not be far off. Whatever that means for humanity.