Automated Image Annotation Pipeline; Data Gap in Our Current World

Cooper, Rachael, School of Engineering and Applied Science, University of Virginia
JACQUES, RICHARD, EN-Engineering and Society, University of Virginia
Orrico, Elizabeth, University of Virginia

Both my STS research paper and technical research project are directly related to Big Data and Data Gaps. My STS research project aimed to identify and evaluate the societal effects and concerns of data gaps on both ethnic minority groups and women in America. Similarly, my technical capstone project aimed to identify the biases surrounding fashion-based data through the implementation of Computer Vision and Natural Language Processing (NLP) by evaluating the output of unsupervised learning techniques.
The goal of my STS research paper was to evaluate the disparities between ethnic minorities/women and the majority population to alleviate the lack of understanding prolific throughout society. By identifying the consequences of data gaps littered throughout everything in modern society, the extremity of the inequalities become highlighted and potentially avoided in the future. In this paper, I researched several examples of data gaps for both gender and race, where the data gaps may not have been as visible before identification. Each example was from preexisting outlets and was not manipulated or evaluated upon in any way. In addition, for each, I began with background information, followed by the problem. I finished each piece of evidence with analysis concerning the role the specific example has on the overall scope of the ones affected. Lastly, I also demonstrated how this issue is viewed societally by contrasting two different STS frameworks and their applications in the STS research paper.
In my technical capstone project, I aimed to identify how biases in data can result in undesired outcomes, and how difficult it is to apply Machine Learning techniques onto images where race may play a role in identification. This was done by attempting to create an end-to-end pipeline that takes in clothing images and outputs a human readable sentence as a description for the image. The pipeline used unsupervised learning to have the data itself draw connections between the images and group them together. The pipeline then created folders with the grouped images, so the user can see how the yolov7 model performed. The sentence descriptions were based off all the images in a group for a more generalized approach. The end goal of the technical project was to develop a functional pipeline that produced any type of human-readable output related to the inputted image. Then, be able to identify whether the faults were from the code or from the data.
While working on the STS research paper and the technical capstone project, I realized how much they influenced one another. My STS research paper identified the effect and example of data gap biases, which made me more aware of how the input data for my technical project would be interpreted, and what biases it might contain. By doing both a technical project and a research paper, I have identified the effects, and exemplified them in my own work. In addition, I have a better understanding of how difficult it is to mitigate biases in data by working with them myself. Overall, the data gap affects every aspect of society in ways that may not be recognizable, but it is important to acknowledge the consequences and work to remedy them.
Lastly, I would like to acknowledge my STS professors, Professor Richard Jacques and Professor Catherine Baritaud, and my capstone advisor, Professor Elizabeth Orrico.

BS (Bachelor of Science)

School of Engineering and Applied Science in Computer Science
Technical Advisor: Richard D. Jacques
STS Advisor: Elizabeth Orrico

All rights reserved (no additional license for public reuse)
Issued Date: