Voice-Controlled Pong and the Danger of Generative AI

Oline, Edward, School of Engineering and Applied Science, University of Virginia
Jacques, Richard, Engineering and Society, University of Virginia

The first implementation of speech recognition technology was created by Bell Laboratories in the early 1950s with their computing system nicknamed “Audrey”. The machine was able to decipher which numerical digit, zero through nine, was spoken by the head engineer with approximately 90% accuracy. Fast forward about a decade to 1962, and IBM debuted the “Shoebox” which could now recognize 16 distinct words. In the 60 years that have passed since then, these technologies have advanced at exponential levels of innovation. Today we have incredibly complex virtual assistants in mobile devices complete with advanced artificial intelligence and machine learning models capable of conducting complete conversations in dozens of languages. Beyond the ability to interpret words and phrases, voice analysis software can detect vocal inflexion to determine the mood and emotion behind speech. These technologies have gotten so much smarter and more complex they are now extremely dangerous. The new wave of speech technology is using AI to mimic or recreate someone’s voice. A primary example of this is Microsoft’s VALL-E, which can utilize just a three second clip of your voice to replicate it speaking any given text. The capstone project undertaken by my group utilized one of the simpler and safer forms of speech recognition, pitch detection, to play a video game. This technology works similar to tuners for musical instruments which analyze the frequency of the audio signal to determine what the pitch is. To apply our technical expertise in this new topic, we integrated pitch detection into the video game, Pong. Pong was the first commercially available video game created in 1972 by Atari in which the player is tasked with moving a bar up and down, representing a paddle, to bounce a ball back and forth with another player, creating a two-dimensional version of tennis. My team and I recreated this classic game, substituting the joystick controller with a microphone. The player will input a high-pitched sound to move the paddle up, and a low-pitched sound to move the paddle down. There were numerous technical challenges involving both analog and digital signal processing techniques, along with integrating the various hardware and software elements into a cohesive system. Additionally, ensuring everyone is not only capable of playing this game (regardless of any sort of musical ability or unique speech attributes), but can do so without concern over privacy of their voice.

BS (Bachelor of Science)
All rights reserved (no additional license for public reuse)
Issued Date: