Abstract
My technical project and STS research projects both pertain to speech deepfakes and
ways to prevent fraud using the technology. In my STS research paper, I analyzed a case of a
cloned voice being used in a fraud attempt and the factors that allowed it to succeed. In the
technical project, my team and I built a system to help prevent speech deepfake attacks. The two
projects reinforce each other and provide technical and social grounding that can be used to
analyze the sociotechnical factors behind speech deepfake attacks.
In my technical project, I worked to address the surge in cloned voices being used to
impersonate and defraud individuals and organizations. To do this, my capstone team built a
spoofing-aware speaker recognition system and launched a web application for users to test
audio clips that are potentially spoofed or aiming to impersonate specific individuals. Generally,
speaker recognition systems perform well at discriminating between distinct voices, but struggle
to differentiate between an individual’s voice and a cloned version of their voice. Similarly, AI
voice recognition systems are not fully accurate. To address this, my team combined a speaker
recognition model with an AI voice detection model in an attempt to increase the robustness of
the standalone speaker recognition model and provide multiple points of reference to an
individual for determining whether an audio clip belongs to a specific person. Furthermore, my
team built a web application to allow users to access the model, providing an accessible means to
test potential impersonation attempts.
In my STS research paper, I explored a specific case of AI-generated voice clones being
used to impersonate Italy’s Defence Minister, Guido Crosetto. In the report, I analyzed the
various factors aside from the use of an AI voice clone that allowed this attack to succeed. To
perform this analysis, I used actor-network theory (ANT), and STS framework developed mainly
by Bruno Latour, John Law, and Michael Callon, which explains how human and nonhuman
actors interact within networks that work toward a goal. Using ANT, I argued how the attack was
not only made possible through Crosetto’s cloned voice, but rather a multitude of actors enroled
by the fraudsters and a lack of precautions taken by the victims. Through this paper, I aimed to
show that current discourse on deepfake policy does not properly account for the factors that
make the technology threatening.
Working concurrently on the STS research paper and the technical project was beneficial
for both projects. Through the technical project, I broadened my understanding of how speech
deepfake systems function and are deployed, which gave me better insights for the case I
discussed in the research paper. Similarly, working on the STS research paper, I saw the
sophistication of speech deepfake driven fraud, which influenced the way my team built and
tailored our application. Ultimately, I believe that the relevance between my two papers gave me
a foundation to more thoroughly analyze the case and build a tool that is potentially useful for
others.