Google Home Mini 2.0
Revealing what the Google Home is thinking
Individual work completed in the course Environments Studio IV.
Live Demo | Source Code
(due to external library use, the live demo only works on chrome versions below 60.0 . To see a video demo, scroll down. If you still really want to use the live version, download chrome 59.0 here)
Despite our best efforts, AI algorithms are not perfect. Over time, their accuracy will improve. However, even the most complex and well trained system will invariably make a mistake at some point. The trouble with current IPAs is not that they make mistakes, but that when they do, there is no way for us to help them recover from error. This is a proposed redesign of the Google Home Mini that helps it recover from such errors by communicating what it’s thinking.
Problem
When we speak with other humans, we’re constantly receiving verbal and non-verbal queues. As a result, we can catch when people misunderstand a request or are confused with what to do. Unfortunately, with most IPAs (Intelligent Personal Agents, ie Alexa, Google Home, etc), we only see (ie hear) what goes in (our question) and what comes out (their answer) and none of the process that happens in between.
During this interval, the device performs a complex set of activities. This includes processing our question to understand what we said and using that comprehension to select an appropriate response. Unfortunately, we can’t see any of these steps. When the device makes a mistake, we’re not sure why or how we can get a different response in the future.
Early Decisions
Implement a visual interface. For my solution, I chose to improve upon the IPA's visual interface. This is because when we speak to people, we don't just listen to the words they say. We make inferences based on their facial expressions and physical movements as well. Current IPA's give minimal, if any, visual indicators. This is an area in need of improvement that I wanted to explore.
Use the current Google Home Mini as a starting point. I chose to use a Google Home Mini as opposed to an Alexa, HomePod, etc for a few reasons. For one, its physical form lends itself well to a visual interface. In addition, I already had one on hand to test responses. Finally, I'm fond of this particular donut shape and thought it would be fun to work with.
1st approach: three separate animations
For my initial solution, I chose to focus on three stages within the experience that could use improvement.
Activation
Just as we expect people to face us when we’re speaking to them, this activation light faces us to let us know its listening.
Speech & Language
Once the device has finished processing a question, it will show how much it understood. The phrase will be displayed on the device with glowing lights indicating parts it comprehended and dull lights indicating sections it wasn’t sure about.
Response Selection
Instead of offering one single response, the device will display a few possible options in a pie chart layout. The percentage of the circle that an answer occupies indicates how confident the Google Home is that that response is the one the user wants. If the user is unsatisfied a response, it can ask the program to try again and it'll provide the answer contained in another pie slice.
Implementation
To implement a working version of my solution, I created a virtual version of the Google Home Mini using p5.js. I then hardcoded a few sample questions that users could ask to see how the device handled requests. In order for the program to hear what users were saying and talk back to them, I employed p5's sound and speech libraries.
FeedBack
After getting a chance to test out my program at the course's end of semester studio show, 'Where are The Humans in AI?', I realized that it had more than a few issues. The animations didn't tie together as well as I intended them to. As a result, my implementation seemed to muddle people's mental models of IPAs more than clarify them. In addition, while I received positive response to my 'Response Selection' concept, the other two were met with mixed results.
2nd approach: One central animation
Based on this feedback, I decided to redo my implementation to focus entirely on the 'Response Selection' animation. This allowed me to create a clearer mental model of the device's thinking process and refine the solution to a more specific pain point.
The new approach consisted of two primary stages:
Activation & Listening
The device displays an array of dots that make up its 'brain'. Each of them represents a possible answer that the system could provide for any given question.
Response Selection
Just before the device provides its answer, a few of the dots grow large while the rest disappear, inferring it has selected a few possible results from its array of answers. The relative size of each dot represents how confident the program is that this is indeed the one the user wants. If the user is unsatisfied a response, it can ask the program to try again and it'll provide the answer contained in another dot.
(re-)Implementation
Real world application
While I'm extremely satisfied with how this project turned out, I'm aware that it is unlikely that Google Home Minis, or any similarly priced IPAs, would adopt this type of approach approach. That's because this would involve installing a significantly more complex display system that would drive up the cost (and consequently, price) per unit. This opposes existing goals of miniature IPAs, which is to provide a cheap product that lowers the barrier of entry to adopting a smart home system.
That being said, I'd like to see this approach adopted by a higher end IPA (perhaps the Apple HomePod) or a mid priced model later in the future once this these types of products evolve and mature.
Maayan Albert © 2020 | maayan.albert@gmail.com