Pitch Controlled Pong; Bias in Speech Recognition Software

Author:
Duke, Isaac, School of Engineering and Applied Science, University of Virginia
Advisors:
Forelle, MC, University of Virginia
Powell, Harry, EN-Elec & Comp Engr Dept, University of Virginia
Abstract:

My technical research project was titled ‘Pitch Controlled Pong’. In essence, this project
created a fresh spin on the classic video game Pong, by making the inputs controlled by the user
singing higher or lower into a microphone as opposed to using a joystick or keyboard. Similarly,
my STS research also involved vocal inputs, but this time the focus was on speech recognition
software (SRS) and the bias it can exhibit from user to user. The connection between the two is
slightly flimsy, both focusing on input from a human voice, but they do share some similar
concerns on both a technical and societal level. The technical function of both my capstone
project and an SRS tool like Alexa is to perform a meaningful action based on what the user does
with their voice. At the level of transistors and other electrical components, both use similar tools
to filter, isolate, and process human vocal input. Both also have to consider the wide range of
human voices and try to accommodate differences as best as possible. The pitch controlled pong
project had to consider the range of the highest soprano to the lowest bass and make accurate
judgements on the pitch being measured, which proved to be non-trivial. An SRS tool like Alexa
also has to consider vocal differences including accents, speech impairments, and a variety of
other factors when taking in audio input.
My technical project involved constructing a physical single player video game module with
vocal controls. The module runs a version of the classic game Pong, but with a twist: the user
moves their paddle up and down based on the pitch of their voice. Based on an initial calibration,
the user sings a relatively high pitched note to move their paddle up, and a relatively low pitched
note to move their paddle down. The system then computes the average of the two notes and
moves the paddle up for anything higher than this average, and down for anything lower than
this average. The opponent is a computer controlled paddle on the opposite side of the screen,
that adapts its difficulty dynamically based on the score of the game. The module consists of a
small monitor in a display console with an external microphone for the user to vocalize their
input into. This microphone is plugged into a printed circuit board that filters and isolates the
audio for the computer to analyze.
My paper examines the biases and limitations present in speech recognition software
(SRS) and explores potential solutions for creating more equitable and inclusive SRS tools. I
discussed the challenges in creating SRS models that accurately recognize a diverse range of
accents and speech patterns, and highlighted how this can lead to disparities in user experience
and access to technology. The paper draws on various studies and sources to provide evidence of
the disparities present in current SRS tools, with a focus on the limitations in recognizing accents
that are not part of the dominant culture.
I ultimately argued that while developing more inclusive SRS tools may be a significant
investment of time and resources, the potential benefits of increased access to technology and
more equitable user experiences are significant. Overall, the paper advocates for greater
transparency in the development and marketing of SRS tools, as well as a commitment to
ongoing improvement and innovation to ensure that these tools are accessible to all.
Completing both projects simultaneously was beneficial for two major reasons. First, it made
me consider the sheer level of technical processing both in hardware and in software that is done
on audio inputs for something like Alexa. Getting a microphone to pick up pitch alone proved to
be challenging, so the idea of making a device capable of understanding a wide variety of human
speech patterns seemed all the more incredible. It also gave me insight into the social
implications of creating a device that uses the human voice as an input. While something like a
keyboard works exactly the same for most people, everyone’s voice is different which presents a
number of challenges. I believe that by working on a piece that dives into the bias in SRS tools I
was able to avoid and plan for potential biases that could have existed in my technical project.
Specifically, my group consisted of all men and when first designing our project we only used
training data from our group alone which could have potentially made those with higher voices
unable to use it. By considering this possibility early on I was able to bring in some female
collaborators with higher voices than the members of my group to eliminate this bias before it
was baked into the design of our project

Degree:
BS (Bachelor of Science)
Notes:

School of Engineering and Applied Sciences
Bachelor of Science in Computer Engineering
Technical Advisor: Harry Powell
STS Advisor: MC Forelle
Technical Team Members: John Phillips, Teddy Oline, Charlie Hess

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2023/05/12