The Future of Voice User Interfaces

Interfaces have come a long way. We began with buttons and toggles. The emergence of touch-screen interfaces changed the way we interacted with our mobile phones. But the advent of voice brings with it the potential to raise the game significantly – for everyone. While voice technology has come a long way in the last few years, we are still far from our dreams of the perfect Voice User Interface.

It’s obvious that talking to a Voice User Interface (VUI) is still not quite like talking to a human. Many may argue that this isn’t necessary, and that being able to bark fragmented command words at a disembodied voice gets the job done. For now maybe that’s all we need, but let’s think about the many future improvements that could be made to voice technology.

Here are a few of the most anticipated features that will take this technology to the next level.

Quicker acclimatisation: Related to the above, the easy, intuitive nature of voice means less user training and, hence, faster acclimatisation. This precludes one of the traditional entry barriers of any new technology: that dreaded learning curve.

Speed: Another big advantage is that when it comes to complex commands, voice can work faster and save a lot of time. A user can simply say “Please raise a bill” and go directly to the payment gateway on an e-commerce site, whereas on a touch screen, that same command might take them through three to four screens.

Sometimes voice works best: There are some situations where voice is probably the best and safest option. Driving is the obvious one where both hands have to be on the wheel. But there are others: cooking, working on a construction site or even when one is wearing gloves – all instances when touch is compromised and when voice makes so much more sense.

Context
The subject of context is one of the most talked about within the VUI community. In the future we can expect devices to be able to hold context over much longer periods of time than they can currently.

Google has the ability to keep context for a few commands. i.e. –“Who is Brittany Spears?”, “Who is her mother?”, “Where was she born?” etc. This works well with the new Continued Conversation feature that Google has recently rolled out across the US (and soon UK).

Continued Conversation allows the device to continuing listening after it relays information, so that it can capture any follow up questions from the user. This means users don’t need to say wake words at the beginning of every sentence, much like normal conversations. Since this feature relies on multi-sentenced conversations, hopefully we will see an increase in Google’s ability to hold onto context for longer periods of time.

A device that could hold memory of previous interactions could help it understand the user’s future requests.