In robotics theory there is a famous notion called the uncanny valley. The uncanny valley basically explains that robots when gaining human-like features are being liked more and more, until a certain point when the robots become too human-like and cause discomfort. The sharp drop in likability is when people start describing a robot as creepy, and this drop is called the uncanny valley.
So, as you can see in the graph below, the “familiarity” or the “likability” increases as the robot becomes more and more like a human, up to a certain point where it becomes too close to being human.
Apart from the interesting name, the uncanny valley is an interesting concept to consider when talking about Natural Language Interaction (NLI) technology. NLI enables devices to understand, think and react just like humans. Devices can nowadays become human-like in their ability to understand the nuances of language and reason almost like humans do. The latest developments in NLI technology have brought natural language based human computer interaction incredibly far from the early steps with chatbots years ago. Today your smartphone app, your smart-TV or the virtual avatar at your retailer’s website is able to understand you, to reason and act. These devices will react just as if it were a human assistant.
Also advances in 3-d graphics make Virtual Assistants look more and more human-like. Sol, the Virtual Assistant on Artificial Solutions’ website looks very human-like. The body movements are natural, she blinks, and when she talks her lips are pronouncing exactly the words she is speaking, she even frowns when confronted with offensive language.
So a virtual assistant that looks like a human, that understands and reasons like a human, that asks questions back and talks to you like a human; why haven’t we dipped into the uncanny valley yet? Why aren’t we getting the shivers when interacting with such an intelligent Virtual Assistant? Why do we not perceive her as creepy?
I believe the answer lays in the voice. The voice of a Virtual Assistant, whether in a smartphone, a smart-TV or a website Avatar, is generated by Text-To-Speech (TTS) technology. The Teneo Interaction Engine sends a response text string to a cloud based TTS engine, which then immediately creates a sound file that is downloaded and played in the browser, all in milliseconds. These auto generated responses simply sound robotic. After all those years, TTS hasn’t been able to improve the naturalness of TTS spoken texts. It is still obvious that the spoken responses from NLI devices are robotic, and this prevents us from dipping into the uncanny valley. As long as a human-like robot has a robotic voice, the distance between humans and robots will be clear. It will be interesting to see how the TTS technology will evolve and indeed become near-human like. I suspect that even though we could get closer to more human-like voices, in the end we’d prefer to have robotic voices over real human-like voices.
The moment a Virtual Assistant tells me in a real human-like voice like HAL in 2001 “A space odyssey”; “I’m sorry, Daniel. I’m afraid I can’t do that”, that’s when I will get the shivers.