Online Archive of University of Virginia Scholarship
CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present4 views
Author
Ibrahim, Ashraf, School of Engineering and Applied Science, University of Virginia
Advisors
Moore, Hunter, EN-SIE, University of Virginia
Carrigan, Coleen, EN-Engineering and Society, University of Virginia
Abstract
Statistics backs policies that range from how grants funds are distributed to programs helping parents with childcare. Proper representation is a cornerstone to valid statistical modeling. When using statistical tests or training machine learning models, minimizing misclassifications is integral to making those valid conclusions. My technical and sociotechnical portions of my capstone aim to address that problem: How to minimize the misuse of statistics in order to make valid conclusions.
Image recognition models are increasingly deployed in different decision critical contexts. Ensuring reliable data allows those models to inform proper decisions. A threat to training data used by machine learning models is if an adversary purposefully flips different image classifications in order to shape model outputs. This label flipping attack can be used to trick a machine learning model into thinking a civilian is an adversary, for example. In my technical project, my team and I investigated different outlier detection techniques in order to isolate and remove images whose labels had been flipped. The technical project investigates how to ensure that input data for a machine learning system is valid, in order to minimize misclassifications.
Statistical misuse has been rampant in the past and the present, effecting a variety of different groups. My sociotechnical paper aims to use Crawford’s mechanization framework to draw parallels between the assumed validity of AI representing human minds and statistical methods representing populations. It covers how eugenicists’ assumptions were assumed correct and how those impacted marginalized communities in the early 1900s all the way to the modern case of dataset representation and how assumed representation in datasets can marginalize already disadvantaged communities. The sociotechnical paper directly addresses how to minimize the misuse of statistics and its ethical implications.
Degree
BS (Bachelor of Science)
Keywords
ethics; statistics; machine learning; data poisoning; ai
Sponsors
Hardshell AI
Notes
School of Engineering and Applied Science
Bachelor of Science in Systems Engineering
Technical Advisor: Hunter Moore
STS Advisor: Coleen Carrigan
Technical Team Members: Devon Alexander, Eli Cook, Adam Fridley, Hunter Oakey
Ibrahim, Ashraf. CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-02, https://doi.org/10.18130/bwcw-qf18.