Affiliate links on Android Authority may earn us a commission. Learn more.
I really want a ChatGPT-like voice assistant in my Google Nest speakers
The past few months in the tech world have been a whirlwind. One minute we’re amazed but not so impressed by Dall-E’s low-quality AI-generated images, the next we’re somehow chatting with Bing, our favorite new search engine. I can barely keep up. Every day there’s a new Twitter thread showing a groundbreaking AI tool, a new way to use ChatGPT or Midjourney, or a new capability built on top of ChatGPT’s API. And somehow we’re at ChatGPT 4 already? But through it all, one idea keeps coming back to me: Most of the time, I don’t need the AI when I’m staring at a screen; instead, I’d much rather have this ChatGPT-like conversational skill as a voice assistant in my Nest smart speakers.
And the reason for this is twofold. One, Google Assistant has always been slow to understand and answer any slightly complex question and it seems to be getting dumber by the minute. Two, a conversational AI makes more sense in a voice interface than on a screen. Let me elaborate.
Are you happy with the current state of smart speakers?
Google Assistant, like Alexa and Siri, feels a bit outdated today
Over the years, Google Assistant‘s strength has always been its ability to understand and execute voice commands issued in a natural language. Ask it “who wrote Pride and Prejudice” or “what’s the name of Pride and Prejudice’s author” or “who’s the author behind Pride and Prejudice,” and it’ll answer Jane Austen in all three instances. You can try dozens of other ways to phrase that question and it’ll still get it right.
This makes Google Assistant an invaluable tool for setting up reminders and timers, adding meetings, asking general knowledge questions, playing specific songs, and controlling the smart home. You don’t need to remember an exact command to turn off the lights, you can just say it however naturally it comes to you.
Assistant is good at executing commands it's been taught. But answering open-ended questions is its biggest weakness.
But dig beneath the surface a little and all the cracks would show. Instead of playing the original song you wanted, you may get an acoustic, a remix, or — heaven forbid — a cover. It might also give you advice on how to clean your kitchen instead of telling the smart vacuum to clean the kitchen like you intended.
Nothing, though, is as damning as what happens when you ask Assistant an open-ended question. You’ll hear it ramble an endless quote, citing one specific site, that may or may not properly answer your question. Basically, it’ll read you the first Google Search result’s snippet with zero consideration for context. It’s too verbose, frequently confused, and often unable to dig a few layers deep to find an answer. Let me show you three examples that illustrate these.
Assistant is too verbose, frequently confused, and often unable to find an answer.
Example 1 – confused: My husband and I were discussing a potential trip to Czechia and were wondering if the rail system was robust, which would make day trips and transit easy. I asked if it’s “easy to travel by train in the Czech Republic” and it gave me directions to Czechia from my current location. Rephrasing to “inside” instead of “in” didn’t help.
Example 2 – unable to answer: I was fiddling with my Olympus camera’s settings. I came across a menu with no explanation at all; the options were LF, LN, MN, and SN. So I asked my Nest Audio about it and its answer was that it can’t compare the settings, then it asked me if I wanted to know the difference (uh, repeating my question?), I said yes, and it just stopped. No reply.
Example 3 – verbose: After my recent trip to Barcelona, I was wondering about Spain’s political system, so I asked Google if it has a parliament. The answer was a website snippet that started with the two houses and then told me that those count as a bicameral parliamentary system.
Now compare the answers from a traditional voice assistant above with what a large language model like ChatGPT can provide. ChatGPT understood my intent behind that same transit in Czechia question, started with a yes, to give me an immediate answer, then went on to explain the perks of the railway system. Because it talked a bit more than I wanted it to, I restricted its output in the next questions to one sentence. And it understood both of them, explaining what the camera settings were and starting with a “yes” to explain Spain’s parliament situation.
There’s no command that can restrict Google’s answer to a sentence or force it to cut down its chatter time. Also, all current voice assistants are unable to synthesize an answer from multiple sources, which is one of the strengths of ChatGPT and alternative language models.
Conversational AI: On-screen vs voice interactions
There are thousands and thousands of potential uses for a conversational AI like ChatGPT, but one of the most interesting ones I’ve found for my own use is its ability to synthesize an answer from multiple sources while understanding the constraints of a request. You can make it talk less like I showed in the example above, ask it to explain complex concepts like you’re five years old, or give it any number of restrictions to fit the search to exactly what you want.
This is why it makes even more sense to interact with this kind of AI via voice. When I have a screen in front of me, I can skim through multiple answers in a second, quickly tell which ones are irrelevant, and only choose to expand the ones I want to hear more about. When I use voice commands, I have no option but to listen to the one answer that Google Assistant is giving me and, as we’ve established earlier, that answer can sometimes be far from satisfactory.
When looking at a screen, I can skim through many results in a second. When I use voice, I can only listen to the one answer I get. As of now, that answer is rarely good enough.
I mean sure, Google is perfectly able to tell me when Real Madrid’s next game is, who’s the president of France, or how tall is Mac McClung, but I wouldn’t dare ask it if I can make a cocktail with yogurt liqueur and amaretto but no egg white, or if there’s a direct train from Paris to Rome. Before I even try, I can imagine all the ways it’ll misunderstand or mess those requests, thus forcing me to pull my phone out and start a lengthy Google or Bing search session to answer them.
And that’s the thing. If all Google Assistant does is blabber for two minutes while reading me a snippet from the first search result, then it’s a waste of my time. I’d much rather pull out my phone and do the search there; at least I can skim through more than just one result in a few seconds.
I don’t want to single out Google here. Amazon Alexa and Apple Siri‘s current voice assistant implementations can’t save me any amount of research time either, nor do they compel me to use them any more than Google. And this is exactly where I stand with any voice assistant today: I just use it for some smart home controls and the most basic searches and requests.
If I had an AI voice assistant that synthesized content from many sources and gave me a brief and satisfactory answer, I'd use it again and again.
But if I had an AI voice assistant like ChatGPT that synthesized content from multiple sources and gave me a short and satisfactory answer each time I asked it something, then I would turn to it again and again. I’d rather do that and stay engaged with what I’m doing than pull out my phone, look at a screen, and get lost in it for half an hour.
ChatGPT isn’t perfect, but I want a voice assistant like it in my Nest speakers
Although I’ve been extolling the virtues of ChatGPT for a while, I don’t want it in its current state in my Nest speaker or any other smart speaker. Its training data is old, it’s often too verbose unless you restrict its output to a sentence (but again, I appreciate that I can do that), it doesn’t cite sources, its data is vastly superior in English versus other languages, and it obviously can’t control my smart home or add events to my calendar, among other restrictions.
Microsoft has gotten close to fixing the first problem on that list. Bing Chat can harness the power of ChatGPT’s language model and apply it to current news and events too, thus providing more up-to-date answers — my colleague Calvin even says Bing Chat is so good, he’s not going back to Assistant. But again, that’s only one limitation of ChatGPT.
What I’d like to see is a Google equivalent. Call it Google Bard or Assistant 2.0 if you want, but here’s how I picture my voice interactions with it:
- It should be able to handle the same requests that the current version does (smart home, conversions, reminders, calendar, etc…).
- It should also offer a smarter, natural language AI that synthesizes content across multiple sources from the web and takes into consideration any restrictions or parameters I limit it to.
- For the sake of brevity and immediacy, its answers shouldn’t say the names of sources out loud and should be limited to one sentence (unless prompted otherwise). But, I should be able to ask it for extra details and lengthier explanations.
- And for the sake of accuracy and further learning, it should always send me a notification to my phone with the answer it provided, the sources it used, and an option to tap to do a full search and learn more.
- I should also be able to control it and restrict its usage of specific sources to avoid content I deem low-quality or inaccurate.
This is the kind of voice assistant AI evolution I’d approve of and start using. Only time will tell if Google will take things in this direction or will choose a different path.