ASR-based System for Speech Therapy in Adults; An Analysis of Advancements in Technology and Its Social Impact on the Audiology Landscape

Williams, Brandon, School of Engineering and Applied Science, University of Virginia
Rogers, Hannah, EN, University of Virginia
Li, Jundong, EN-Elec/Computer Engr Dept, University of Virginia

Children and young adults who suffer from hearing loss or auditory processing problems may experience a delay in the development of their speech given that they cannot hear quiet sounds such as “s”, “sh”, “f”, “t”, and “k”, which lead to speech impairments. The problem is that children often discontinue speech therapy once they enter upper-level schooling as these minor speech issues start to become more permanent, leaving them without access to a speech therapy solution. Previous apps found in speech therapy work by helping to fix single words or syllables at a time without considering the inclusion of others' perception of what speech-impaired users may sound like to them. In transcripts of Zoom’s closed captioning services this school year, their platform occasionally outputted words that did not align with what the professor may have been attempting to say, inadvertently pointing out “quirks” in their pronunciations. This topic is worth examining in that this added context may identify words and sounds that users may not have realized they struggled with, a key mechanism which would provide more input in the speech correction process. Applying this concept to an automated form of speech therapy, I will be proposing a mobile application would prompt the user to read aloud from a pre-selected script into their device, feed this speech signal into a highly accurate open-sourced audio-to-text algorithm that would transcribe the audio, and flag words that did not match the original transcript as “mispronounced”. Users will be given the opportunity to see what the program interpreted their “unclear” speech as, equipping them with a unique insight into “how” they may be mispronouncing sounds through the design of a human-computer friendly interface and a unique application of open-source deep learning audio-to-text algorithms.
With machine learning technology becoming more prevalent in hearing aids and the accelerated adoption of telehealth services as a result of the COVID-19 pandemic, there has been an array of relationship-transforming implications introduced to society. As a result, the current service delivery model for audiology care has been disrupted, triggering a potential power shift in relationships between actors involved in the audiology landscape. Through Actor-network Theory and a series of user studies, I will attempt to discover the ways in which advancements in technology have transformed relationships between audiologists and patients as we transition into a new era of what has been termed “Connected Audiology.” Readers can expect to learn about the Doctor-Patient relationship model as applied to audiology, how telehealth services and machine-learning enabled hearing aids have disrupted this model’s status quo, and how surveys reveal the sentiments of audiologists and patients with respect to these changes.

BS (Bachelor of Science)
automatic speech recognition, actor network theory, connected health, audiology
All rights reserved (no additional license for public reuse)
Issued Date: