We all talked at least once to a virtual personal assistant (VPA) in our life. It could be Siri, Alexa, Cortana or even the new Google assistant – they are everywhere. Most of the time though, people stop using their personal assistants after the first interaction and could barely find a “common language”.
Have you ever thought about why that is? Why don’t the majority use a personal assistant daily? The answer is quite simple: we don’t trust them. It’s not like we think personal assistants are going to lie to us or steal our money, it’s more basic than that – we don’t believe they truly understanding us; we don’t trust their capabilities.
But wait, what does that even mean?
I’m dividing the whole “Understanding” concept into three parts: the words themselves (voice to meta); the meaning of the words including context and grammar; and the appropriate action based on the words.
Our VPAs must be excellent in all three parts and only then will humans begin trusting them.
Understanding the words
If you ask Google assistant something, it will first try to convert your voice into text using Google’s “speech to text” engine. The quality of this process is crucial when trying to build a good relationship with the user. No matter how smart the personal assistant is, if it can’t understand the user, it basically can’t do anything.
This first step is very important because the other parts are based on the information that is gathered from this part – getting the words wrong will affect the success of the following two parts.
Today’s personal assistants are doing a great job in this area, able to understand our voice and convert it appropriately.
Words into meaning
After understanding our voice and converting into words, things get a little complicated.
It’s not enough to just get the sentence right, as the real challenge is to understand the meaning of the words, their context, and the grammar. The personal assistant must be familiar with thousands of subjects and master endless domains to be of use.
Unfortunately, this is the part when personal assistants start to show signs of weakness. Today, we might trust our assistants to transcribe our voice, but the meaning isn’t necessarily understood.
For example, you may ask our VPA to “Take a selfie”, and while it may transcribe that request perfectly, it didn’t understand the meaning or the word ‘selfie’ well enough to take the selfie per the request. On the other hand, if you will ask to “Take a photo”, it will better understand it, meaning “shooting a photo”.
The failure of this part is probably the main reason for the trust issue. Our assistants usually don’t fully understand complex linguistic structures – even if it’s a basic sentence like “Take a selfie”.
Without this part, we can’t proceed to the next, in which the personal assistant creates true value for the users by turning requests into helpful actions.
Words of action
The last part, the holy grail of personal assistants, is to perform the right action based on the previous two parts. Even fully understanding the meaning of the phrase isn’t enough to solve the trust issue.
Think about all the times you were asked to do something, but you didn’t really know what’s needed to get it done. Personal assistants, which passed the first steps, are often in the same spot; they find it difficult most of the time to understand what is the next best action or answer.
When asking complex questions requiring background information or are based on previous knowledge, the personal assistant will likely fail to help.
Let’s look again at the “selfie” example. If you will ask Google assistant to “Take a selfie” something amazing will happen (try it); it will transcribe your words and then open the camera app.
Voila! You might say that Google nailed it and passed all three parts without any problem. Well, Almost…
Google assistant failed terribly in the last part. Yes, the assistant opened the camera app, but instead of opening the front facing camera need for the selfie, it has opened the rear camera. At the end of the day, it couldn’t perform the correct action and provide value to my need. This small example demonstrates how much work remains to make personal assistants trustworthy for humans.
It’s not enough to pass the two parts, as the judgement remains in how well the assistant performed on the last part. Moreover, devices like Google home or Amazon Echo make the first two steps invisible to the user – what matters is only the final action…and value.
The hopeful future
Personal assistants may have improved in voice transcription and language understanding but still have a way to go in perfecting these, as well as providing the right personalized action for our needs.
Still, the tech giants, hi-tech companies and startups, as well as universities are all working on solutions for the open problems in all three parts. Nowadays, we’re witness a huge improvement in the NLP (Natural Language Processing) and NLU (Natural Language Understanding) fields thanks to algorithms based on Deep learning.
Improving the final part of the process with the appropriate action will increase the trust we grant to personal assistants and will increase adoption. Soon, such assistants will become smarter and continuously improve their capabilities – and then we’ll trust them a little bit more.