You might’ve read about the smooth new voice rolling out to Google Search, or maybe you already have it on your devices. Now, Google’s Nat and Lo are giving us a peek at how the magic happens.
In a new video, the two Googlers talk to the linguists, voice coaches, and voice actors that work to make synthesized voice sound as natural as possible.
The video (above) is well worth a watch, but here are the essentials. It all starts with a vast library of phonemes, phones, and diphones, which are the basic units of sound that make up every word. Starting from a set of thousands of phrases spoken by a voice actor, scientists isolate these units and build a database of sound bites that collectively can form any word in a given language.
From there, it’s a matter of search, and we all know how good Google is at search. Then, all the sound bites required to build a word are stitched together to form smooth speech.
Of course, it all gets very complicated once you scratch before the surface. One big problem that Googlers are currently trying to crack is how to capture the natural rhythm and flow of human voice, which can make the difference between a robotic voice and one that’s almost indistinguishable from a human’s. In fact, if you listen to the comparison between the old US English Search voice and the new one, you can notice that Google really improved these characteristics, called in scientific parlance prosody and intonation.
Check it out:
How often do you use your voice with Google Search?