What’s really happening here? Is your phone or smart home device always listening?
Well, sort of.
First, it’s important to mention that Apple, Facebook, Google, and Amazon have all publicly stated they do not record and save everything you say. Apple has gone so far as to publish a technical explanation on how your iPhone recognizes “Hey Siri” and turns on.
In the case of your cell phone, I believe it. I think it would be too energy-intensive to save everything you say; in short, it would drain your battery. In the case of a plugged-in Echo/Google Home/HomePod, those constraints don’t exist, but that’s a post for a different day.
So, how does the iPhone recognize “Hey Siri?”
You may not remember, but when you set up your phone, you recorded “Hey Siri” several times.
When planning the Siri project, Apple had lots of people say different comments, questions, phrases, and phonetics to teach a computer how to hear.
Here’s how that works:
Listening is a process of identifying four things, location, volume, tone, and phonetics. The process of identifying location is how you know someone is behind you if they shout, “Wait up.” Identifying volume is how you know how far that person is behind you and identifying tone is how you know who is speaking. Finally, identifying the actual phonetics allows the computer to understand how each sound in “Wait up” makes up those two words.
Apple then took the phonetics and taught the computer “Phonetic combination X makes up word Y which means do Z.”
There are three parts here:
1. Sorting the proper combination of phonetics to create an actual word. This is harder than it seems. Lots of phonetics, if spaced differently, can make up different words. “Property” vs. “proper tee” vs. “proper tea.”
2. Now we created a word, but we have to check it across the other words we created to see if it makes sense.
3. Certain words mean action, directing the computer to do something. “Play Goodbye Yellow Brick Road” means play the most popular song under that title.
The important thing to know here is twofold: first, that the phone has recognized a phonetic combination that makes a word, and second that a certain combination means “do something.”
In the case of “Hey Siri.” “Hey” means “listen for the word ‘Siri.’”
Okay, but lots of people say, “Hey.” Remember how you recorded “Hey Siri” a few times when you set up your phone originally?
When you set up your phone, it recorded the way you said it, and also your tone. Like magic.
Listening is made up of four things:
1. Location - how close are you to your phone? Ever notice how “Hey Siri” doesn’t work when you’re too far away? There’s a sweet spot.
2. Volume - Can it actually make out the phonetics?
3. Tone - you recorded “Hey Siri” so your phone could recognize you. Alexa and Google do not do this part.
So, you say “Hey” and your phone turns that into the 0s and 1s it was taught to turn it into and looks only for “Siri.” If it hears “Siri” then it turns on. If it doesn’t, it doesn’t activate anything.
So, is your phone/smart speaker listening to you? Yeah. Is the data saved? I highly doubt it, at least by Apple. By other companies, I’m not so sure, although they’ve gone on record saying they don’t save your conversations. My gut says no, because they don’t need to in order to understand you well enough.
Since you’re being listened to anyway, you might as well say, “Hey.”