Sentiment Analysis Using Machine Learning; Negative Impact of Technology on the Education Gap

Pathipati, Harshita, School of Engineering and Applied Science, University of Virginia
Basit, Nada, University of Virginia
Wayland, Kent, University of Virginia

Because of technology’s pervasiveness in American daily life, understanding its social implications is vital to progress. As a result, my research closely investigated how marginalized communities such as minorities and women are negatively impacted by technology. Specifically, the technical portion of my project focused on how Natural Language Processing (NLP) in machine learning (ML) and artificial intelligence (AI) are biased towards women and minorities. This mainly connects to the idea that these modern technologies are innately partial from textual data. And similarly, my STS research delves into how technology exacerbates the education gap for marginalized communities due to issues such as inaccessibility and disparities in language used and created for technology. Both research topics make it evident that technology worsens the divide amongst various social groups. Thoroughly examining these in my research therefore can contribute to creating a more, fair, and equal system for technology usage.
My technical research was an individual, exploratory learning process of how NLP technologies in modern ML and AI portray bias against marginalized individuals. The goal was to learn through developing a sentimental analyzer, where we could feed textual data and see how it performed. My Capstone advisor for this research was Dr. Nada Basit, an assistant professor in the Computer Science Department at the University of Virginia. The first step in the research was understanding the inner workings of NLP technologies and understanding what to use for training and testing data, splitting it 80% for training and 20% for testing and cleaning it for machine processing systems using techniques such as tokenization. The development of this project was done in Python using its Natural Language ToolKit (NLTK) and Pandas. The sentiment analysis classification algorithms fall under several categories such as sentiment lexicons, deep learning, and classification algorithms, but I chose to focus on the classification algorithms. Based on a series of analyzing prior research and performing individual tests, it seemed that the Naive Bayes, Support Vector Machines (SVM), and XGBoost algorithms were the best. Consequently, my advisor and I decided to wrap these around the K-Fold Cross Validation algorithm. In conclusion, my overall technical research did indicate that the textual data we use on a daily basis is already infiltrated in bias and these machines simply pick up on that bias, as they are supposed to be trained by that data and reflect such bias tendencies.
My STS research examined how technology usage in the education system disadvantages certain socioeconomic groups. Looking into technology from this angle was important, because the digital divide and the education gap already sets those from lower socioeconomic groups, typically including low-income, Black, and Hispanic individuals, academically behind their more privileged counterparts. This later increases chances of academic failure, unemployment and professional success, and creates a cyclic pattern. Therefore, when understanding how technology contributes to exacerbating the education gap, two main concerns were observed: inaccessibility to proper technology and linguistic inequality in technology language. My conducted research included analyzing literature, studies, experiments, professional observations, and statistical measurements. Consequently, the research demonstrated that greater usage of modern technology increased reading and writing comprehension skills, as well as heightened learning and communication. It showed that those who did not have greater access to the technology did not reap the same academic benefits, and statistically speaking, minorities and low-income families suffered lower accessibility to technology. Furthermore, regarding linguistic inequality in technology, my research concluded that for individuals whose first language is not English, using technology in American education was somewhat problematic and more difficult. Specifically, language used in technology is often harder to interpret and as a result, can lead to increased chances of lower academic performance and discouragement to pursue higher education.
These two perspectives on how technology usage worsens socioeconomic divisions provided thorough insight on how marginalized communities follow a repetitive pattern of being disadvantaged from their counterparts. My technical portion displayed how deep rooted we are as a society, as all the real textual data that is fed tends to be infiltrated with biases against underrepresented groups. On the same note, my social topic reflected how deep-seated the issue of inaccessibility is, as simply providing resources is not the answer at all. Both issues really showed that starting in small steps and from the foundations can improve the issues. While it would be more beneficial to have greater implementation for my technical portion, a wise next step would be to create feeders that can remove training data that is biased and work off such machine models. Additionally, for educational systems, schools should target the issue from early schooling rather than finding temporary fixes.

BS (Bachelor of Science)
education, machine learning, technology

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Nada Basit

STS Advisor: Kent Wayland

All rights reserved (no additional license for public reuse)
Issued Date: