CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present

Ibrahim, Ashraf

CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present 59 views

Author

Ibrahim, Ashraf, School of Engineering and Applied Science, University of Virginia

Advisors

Moore, Hunter , EN-SIE , University of Virginia
Carrigan, Coleen , EN-Engineering and Society , University of Virginia

Abstract

Statistics backs policies that range from how grants funds are distributed to programs helping parents with childcare. Proper representation is a cornerstone to valid statistical modeling. When using statistical tests or training machine learning models, minimizing misclassifications is integral to making those valid conclusions. My technical and sociotechnical portions of my capstone aim to address that problem: How to minimize the misuse of statistics in order to make valid conclusions.

Image recognition models are increasingly deployed in different decision critical contexts. Ensuring reliable data allows those models to inform proper decisions. A threat to training data used by machine learning models is if an adversary purposefully flips different image classifications in order to shape model outputs. This label flipping attack can be used to trick a machine learning model into thinking a civilian is an adversary, for example. In my technical project, my team and I investigated different outlier detection techniques in order to isolate and remove images whose labels had been flipped. The technical project investigates how to ensure that input data for a machine learning system is valid, in order to minimize misclassifications.

Statistical misuse has been rampant in the past and the present, effecting a variety of different groups. My sociotechnical paper aims to use Crawford’s mechanization framework to draw parallels between the assumed validity of AI representing human minds and statistical methods representing populations. It covers how eugenicists’ assumptions were assumed correct and how those impacted marginalized communities in the early 1900s all the way to the modern case of dataset representation and how assumed representation in datasets can marginalize already disadvantaged communities. The sociotechnical paper directly addresses how to minimize the misuse of statistics and its ethical implications.

Degree

BS (Bachelor of Science)

Keywords

ethics; statistics; machine learning; data poisoning; ai

Notes

School of Engineering and Applied Science

Bachelor of Science in Systems Engineering 

Technical Advisor: Hunter Moore

STS Advisor: Coleen Carrigan

Technical Team Members: Devon Alexander, Eli Cook, Adam Fridley, Hunter Oakey

Language

English

Rights

Attribution 4.0 International (CC BY)

Issued Date

2026-05-02

Persistent Link

https://doi.org/10.18130/bwcw-qf18

Suggested Citation

Ibrahim, Ashraf. CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-02, https://doi.org/10.18130/bwcw-qf18.

Files

Ibrahim_Ashraf_Prospectus.pdf

Downloads: 86

Download

Ibrahim_Ashraf_STSResearchPaper.pdf

Downloads: 15

Download

Ibrahim_Ashraf_SociotechnicalSynthesis.pdf

Downloads: 18

Download

Ibrahim_Ashraf_TechnicalReport.pdf

Downloads: 16

Download

CleanSight: Detection Strategies for Label-Flipping Data Poisoning Attacks on MNIST; Mechanization Through Statistical Misuses, Past and Present 59 views

Author

Advisors

Abstract

Degree

Keywords

Sponsors

Notes

Language

Rights

Issued Date

Persistent Link

Suggested Citation

Files