Just the Words: Optimizing Lyric Transcription with Audio Signal Processing and Machine Learning; The Responsible Use of Technology for Music Documentation to Preserve Global Cultural Heritage
Lee, Kevin, School of Engineering and Applied Science, University of Virginia
Jacques, Richard, EN-Engineering and Society, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia
Music is an art form, but it has always been intertwined with technology at its core. Throughout this semester, I explored this relationship through both my technical and STS research projects. It quickly became clear that music, inherently being a social system, creates complex challenges and interesting considerations when integrating with technical systems.
My technical report, Just the Words: Optimizing Lyric Transcription with Audio Signal Processing and Machine Learning, describes the research I conducted during an internship on improving a lyric transcription system. This research holds social significance, as the quality of lyric transcription can influence the cultural meaning of songs and the artist’s intended message. It can also impact how music is distributed and received by different audiences. This is an especially relevant consideration in today’s cultural landscape, where platforms like TikTok have demonstrated their powerful influence on music trends and interpretation.
Lyric transcription poses unique challenges compared to standard speech recognition due to many differences, such as variations in pronunciation, tempo, and audio mixing. Many of these differences stem from the influence of cultural or societal traditions in music throughout time. To achieve better transcription accuracy, my research tested the addition of new stages in the transcription system pipeline that attempted to standardize the spoken lyrics more by implementing vocal isolation, silence trimming, and using a better transcription model. While testing did show marginal improvements, another key insight was the significant impact of the music’s genre on transcription accuracy. This underscores how music's social dimensions shape how we should design and apply technical systems in this space.
My STS research project, The Responsible Use of Technology for Music Documentation to Preserve Global Cultural Heritage, dives into the social landscape of music and examines how technology can be used to document traditional music in ways that ethically and responsibly preserve cultural heritage. In many ways, technological preservation can compromise authenticity, especially when considering the widespread influence of colonialism on global music and also the impact it has had on the evolution of musical technologies over time. Through a literature review on the development of music, the evolution of music preservation technologies, and the ethical guidelines established by various institutions for cultural preservation, my project aims to deconstruct the preservation process to examine how we can use technology while still prioritizing cultural integrity and respect for tradition.
The primary results highlighted the need for careful consideration of the recording environment and a cautious approach when applying post-processing techniques. Additionally, with the rise of AI, it is crucial to collect datasets intentionally and respectfully, honoring the work of original artists. AI-generated outputs should also respect the integrity of the original material. More importantly, this research aims to raise awareness and encourage reflection on how technology impacts music. It emphasizes the importance of developing technology that accounts for these effects. This begins with education and the careful restructuring of music standards to ensure that diverse musical techniques and traditions are taught in classrooms around the world. Hopefully, others will be inspired to consider both the social and technical implications of integrating technology with music, fostering a deeper understanding of how these fields intersect and how we can approach musical innovation responsibly into the future.
BS (Bachelor of Science)
Music, Lyric Transcription, Music Technology, Audio Signal Analysis, Music Preservation
School of Engineering and Applied Science
Bachelor of Science in Computer Science
Technical Advisor: Rosanne Vrugtman
STS Advisor: Richard Jacques
English
All rights reserved (no additional license for public reuse)
2025/05/07