The Real Value of Work in an Increasingly Artificial World; An Analysis of Geometric and Machine Learning Approaches to Speaker Diarization
Olsen, Oliver, School of Engineering and Applied Science, University of Virginia
Riggs, Robert, Systems Engineering, University of Virginia
Neeley, Kathryn, STS, University of Virginia
This paper delves into the challenges and methodologies of speech diarization, focusing on the "who spoke when" problem in complex audio environments. This study presents a comprehensive analysis of audio features, explores geometric and pitch-based patterns for speaker identification, and recommends strategies for enhancing audio separation technology. We discuss our progression from manually labeling techniques of audio data using tools like Audacity to developing our own labeling program to transform conversational audio into Rich Transcription Time Marked (RTTM) files. We discuss our use of these datasets to evaluate a fine-tuned Pyannote learning component's accuracy. Additionally, we outline our assumptions about microphone positioning, the fidelity of our manual labeling, and the efficacy of Pyannote's machine learning tools for speaker differentiation. Our findings contribute to the development of more effective speech diarization systems, which are essential for advancing audio processing in a diverse set of applications. Our collection process translated audio files into datasets and recording a spectrum of audio features. These datasets underwent analysis to determine distinguishing patterns between speakers, looking for differences in amplitude and pitch levels in our client’s device. The Pyannote audio toolkit was employed to further refine speaker differentiation. Future work will focus on creating a robust speaker diarization system using a combination of geometric analysis and machine learning optimization techniques.
BS (Bachelor of Science)
School of Engineering and Applied Science
Bachelor of Science in Systems Engineering
Technical Advisor: Robert Riggs
STS Advisor: Kathryn Neeley
English
2023/12/18