Evaluating the Importance of Demographic and Technical Factors in Creating Authentic-Sounding AI-Generated Human Voice Clones; Voices of Uncertainty: The Social Construction of Consumer Perception in Biometric Authentication for Banking

Author:
Ferri, Drake, School of Engineering and Applied Science, University of Virginia
Advisors:
Wayland, Kent, University of Virginia
Gerling, Gregory, EN-SIE, University of Virginia
Abstract:

Advances in artificial intelligence (AI) have reshaped how individuals interact with banks, with voice biometric authentication emerging as a key innovation in AI-driven customer service. This technology offers convenience by replacing passwords with a user’s voice, but its increasing use has introduced serious concerns about fraud, data security, and public trust. Specifically, synthetic voice deepfakes, generated from just seconds of recorded speech, pose threats to the banking systems designed to protect consumers. Financial institutions now face the challenge of maintaining secure and efficient authentication systems while fostering public trust in a process vulnerable to exploitation. This thesis portfolio addresses this challenge from both a technical and sociotechnical perspective: one project investigates how AI-generated voice clones are perceived by machines and humans, while the other explores how consumer trust and awareness shape the adoption of voice biometrics in banking. Together, these efforts highlight the technical vulnerabilities and social tensions surrounding voice authentication, offering insights to improve both its design and responsible deployment.
My technical project, Comparison of Open Source and Commercial Cloning Tools to Evade Detection as Judged by Machine and Human Observers, evaluates how effectively cloned voices can imitate real ones and how various factors influence detection. A library of 336 voice samples was created using four cloning tools (ElevenLabs, Voice.ai, Lovo, and the open-sourced F5-TTS), varied by gender, age, ethnicity, training time (15s, 30s, 60s), background noise, and native language status. Optimization techniques reduced the sample to 81 voices (67 cloned, 14 authentic) that fully represented the overall set of voices. These voices were evaluated in two ways: first, by the NISQA deep learning model, which rates voice naturalness on a 1–5 scale, and second, by 449 survey respondents who rated each voice’s perceived authenticity on a similar 1–5 scale (1 = definitely fake, 5 = definitely real). Results showed that authentic voices were consistently rated as more realistic by humans than cloned ones, and clones based on certain tools (e.g., Lovo), training times (30s), and demographic factors (female, Hispanic, over age 30) were rated lower. However, when clones were filtered by more favorable technical and demographic factors (15s/60s training, background noise, male, non-Hispanic, under 30), their perceived realism matched that of authentic voices. Interestingly, NISQA scores did not reflect these same demographic trends, revealing that human and machine evaluations differ. These findings suggest that security systems relying solely on either humans or machines may fail to detect certain high-performing voice clones, and that blended models and careful design choices are necessary for better fraud detection.
In parallel, my STS research paper, Voices of Uncertainty: The Social Construction of Consumer Trust in Biometric Authentication for Finance, examines how consumers interpret and trust voice biometric systems, especially as voice cloning technologies become more advanced. Using the Social Construction of Technology (SCOT) framework, the analysis focuses on how different stakeholders such as banks, developers, consumers, regulators, and fraudsters assign meaning to voice AI tools. The research question asks: How does consumer trust and awareness shape the usage of voice biometric authentication in banking customer service? Drawing from industry reports, marketing materials, fraud case studies, and policy discussions, the findings show that consumer trust is largely based on the perception of institutional credibility and the promise of convenience, rather than a deep understanding of the technology. Financial institutions and developers often market voice biometrics as secure and seamless, while downplaying the risks of cloning and data misuse. Meanwhile, high-profile deepfake fraud cases have revealed that trust may be misplaced. As consumers become more aware of these risks, trust becomes more fragile. There too exists lack of regulation and transparency that further expands consumer confusion through reduced informed consent, leaving users vulnerable to hidden risks. This work argues that trust in AI systems is not merely a function of their technical performance, but is constructed through narratives, public events, and institutional framing.
In tandem, these projects contribute to the broader issue of responsible AI integration in finance. Technically, the findings show that some cloned voices can convincingly pass as authentic under the right conditions, making them difficult to detect through current systems. Socially, the research highlights that consumers are unaware of these vulnerabilities and that institutional trust is often based on assumptions rather than evidence. Addressing the problem of voice cloning in banking will require more than technical upgrades and instead it will require clearer communication, more robust hybrid detection systems, and regulation that prioritizes transparency and user education. While the results uncover elements of the sociotechnical system that show room for improvement, it has proven difficult to fully assess the impact of voice cloning in banking with its recent and rapid development. Future research could expand these findings by evaluating other biometric technologies, studying cultural and education differences in voice perception, or testing defenses that combine machine learning with human oversight. Ultimately, this portfolio illustrates how addressing fraud in AI-based voice systems demands both technical insight and societal awareness.

Degree:
BS (Bachelor of Science)
Keywords:
Voice biometric authentication, Consumer trust in AI
Notes:

School of Engineering and Applied Science

Bachelor of Science in Systems and Information Engineering

Technical Advisor: Gregory Gerling

STS Advisor: Kent Wayland

Technical Team Members: Rhea Agarwal, Padma Lim, Vishnu Lakshmanan, Fahima Mysha, Baani Kaur

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2025/05/17