Designing Secure and Usable Wake Words; Understanding Why Users Cannot Affect the Smart Speaker Data Collection Design

Wang, Andrew, School of Engineering and Applied Science, University of Virginia
Tian, Yuan, EN-Comp Science Dept, University of Virginia
Baritaud, Catherine, EN-Engineering and Society, University of Virginia

How can engineers protect smart speaker users from intrusive data collection practices? One possible solution is to research new technologies that will help users take control of their own data. To that end, the technical project examines one possible technological solution towards protecting user privacy. Another possible approach is to determine the root cause of the problem and to address the societal factors that contribute to the problem. To that end, the STS research aims to understand why users struggle to protect their own privacy and proposes several directions for future research. Both the technical research and the STS research strive towards helping users navigate the difficulties of data and privacy management as the popularity of smart speakers grows.
A primary weakness with regards to privacy within the smart speaker design is the issue of accidental activation. Whenever a smart speaker perceives input from a user, the device will record the input and transmit the recording to the cloud. However, when the user has not supplied any input but the smart speaker errantly perceives input, the device still records surroundings. Consequently, smart speakers eavesdrop on consumers. The technical research aims to decrease the chances of accidental activation, and thereby invasions of privacy, by proposing new wake words using phonetics as the main criteria. Words that sound unique should be more difficult to misidentify. Therefore, much of the research focused on identifying commonly recurring sequences of sound in the English language, and subsequently ranking vocabulary words based on their similarity to these sound sequences. Afterwards, to evaluate the efficacy of the wake words, they were tested against a set of audio samples to determine the number of accidental activations.
At first the results indicated minimal improvement between well-ranked and poorly-ranked wake words. However, after switching to a more comprehensive evaluation data set, the technical research did indeed confirm that wake words selected on the basis of phonetic uniqueness trigger less accidental activations. On average, well-ranked wake words triggered 3.4 times while poorly-ranked wake words triggered 17.4 times. Furthermore, among existing wake phrases, “Ok Google” performed the best, with only one accidental activation, while “Hey Siri” and “Alexa” performed noticeably worse. Based on these results, wake words should strive to achieve as much phonetic uniqueness as possible. Additionally, the method of deriving wake word uniqueness using recurring sequences of sound can accurately predict the propensity of misactivation for a given wake word.
The STS research centered around understanding why users struggle to shape the way data is collected despite the wealth of privacy control settings available. Researchers have long approached this issue using the framework of technological determinism, proposing state of the art technical solutions as a means to improve user privacy. Instead, the STS research found that the primary cause of privacy issues stems from a gap in technical literacy among the general public, and used the theory of the Diffusion of Innovation to propose future areas of research. Prior research on user attitudes towards smart speakers exposed not just the vulnerabilities and oversights in existing privacy controls, but also the low levels of awareness and familiarity users display towards data privacy management. Additionally, research also indicated that users value convenience and user experience just as much as privacy, rendering privacy features that limit functionality unused. The prior research suggests that the current framework of technological determinism is incomplete.
Moreover, satisfying both privacy and functionality is near-impossible, especially due to the insufficient levels of technical literacy among the general public. Current smart speaker users usually choose functionality over privacy, resulting in the privacy control issues of today. Naturally, one solution to the issues of privacy control is to improve technical literacy. Using the framework of diffusion of innovation, the STS research identified potential obstacles to the diffusion of privacy control settings to the general public, and established the possibility of organizing users into the five adopter groups based on existing research highlighting the range of public attitudes towards smart speaker privacy. Future research should focus on identifying the demographics for each privacy control setting adopter group.
While current technical research emphasizes providing users with better tools to protect privacy, given the insufficient levels of technical literacy among the general public, the potential success of the technical research remains unclear. Instead, more research should focus on diffusing knowledge and understanding of existing privacy control technologies to the general public.

BS (Bachelor of Science)
smart speakers, voice recognition, data collection privacy, Technological Determinism, Diffusion of Innovation

School of Engineering and Applied Science
Bachelor of Science in Computer Science
Technical Advisor: Yuan Tian
STS Advisor: Catherine D. Baritaud
Technical Team Members: Timothy Han, Joshua Sahaya Arul, Andrew Jian Wang

All rights reserved (no additional license for public reuse)
Issued Date: