How Interactions Between Deep Fake Tools and Organizational Protocols Lead to Instability in Corporate Security Networks
Agarwal, Rhea, School of Engineering and Applied Science, University of Virginia
Laugelli, Benjamin, EN-Engineering and Society, University of Virginia
Gerling, Gregory, EN-SIE, University of Virginia
Business and governmental institutions face growing threats from synthetic audio deepfakes due to advances in voice cloning and artificial intelligence. By accessing a short recording of a person’s voice, malicious actors can clone it to say anything they like. This poses serious risks of fraud, identity theft, and loss of trust. While much prior research has explored defensive postures, limited works have considered the factors that make a cloned voice sound authentic. This effort investigates factors leading to more authentic sounding AI-generated clones of the human voice. A voice library of about 350 short samples was created, spanning a range of demographic (age, gender, ethnicity) and technical factors (cloning tool, training time, background noise). Using optimization techniques, a subset of 81 voices (67 cloned and 14 authentic) were selected for an online survey with human listeners (n=449). Each voice was also assessed by the NISQA speech quality and naturalness model. Overall, human listeners perceived authentic voices as more realistic than cloned voices. However, subsets of cloned voices of certain technical and demographic factors were indistinguishable from authentic voices. Finally, human and machine generated ratings did not correlate, indicating that NISQA may evaluate voice authenticity in ways distinct from human listeners.
BS (Bachelor of Science)
Voice cloning, AI, deepfakes, AI fraud, cloning tools
English
2025/04/29