As Google details in its blog post, the voice dictation improvements were made possible with accuracy improvements through deep learning. The “prime focus” of several architectures was to reduce the amount of time it takes for your speech to be transcribed.
The result is an end-to-end, all-neural speech recognizer that lives in Gboard and on your device. The good news is that the speech recognizer doesn’t need an internet connection to work and is 80MB in size. That’s compared to past models that were 2GB in size and later went down to 450MB.
Even better, the new speech recognizer outputs character-by-character instead of word-by-word as you speak. That’s made possible through “a feedback loop that feeds symbols predicted by the model back into it to predict the next symbols.”
Putting all of this together, Gboard’s voice dictation feature should work more reliably, more quickly, and more accurately than before.
Gboard’s upgraded speech recognizer is rolling out now to Pixel smartphones and is limited to American English dictation. You can enable it if you go to Gboard settings > Voice typing > Faster voice typing. We could see Gboard’s upgraded speech recognizer in additional languages and devices in the near future.