Building an AI-backed Internal Search Engine for Government Proposal Intellectual Property; Impact of Implicit Biases in Natural Language Processing (NLP)
Wigode, Evan, School of Engineering and Applied Science, University of Virginia
JACQUES, RICHARD, EN-Engineering and Society, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia
Introduction
The synthesis of my technical project and STS research punctuated the intersection of technology with ethical and societal impacts within engineering practice. My technical project involved developing a natural language processing (NLP)-backed search engine for the company I interned at, RedMatter Solutions, to retrieve information more efficiently from past government proposals. This application spurred from the need to update current practices and technologies that could not keep up with the growing number of proposals that required reference. In a different light, my STS research explored the implicit biases inherent in NLP technologies – biases that can inadvertently arise in systems like the one I developed. Additionally, the research outlined preexisting strategies and frameworks for mitigating such bias in NLP systems. These two projects in tandem seek to add to technical knowledge with perspective as to how such technical advancements may affect society, thus outlining the need for sociotechnical awareness in engineering practice.
Technical Paper
The technical section of my thesis outlines the development of a Question-Answer (QA) search application that leverages open-source NLP technology to enhance access to archived government proposals. This system, developed during my internship with RedMatter Solutions, allows users to input natural language queries such as “What are RedMatter’s cloud capabilities” and quickly retrieve, summarize, and display relevant documents from their preexisting intellectual property cache. This platform utilizes Large Language Models (LLMs) in a pipelined execution process accessible to the user via a full-stack application hosted on the company’s intranet. The development of such a platform demonstrates the practical application of NLP in streamlining organizational processes and improving data retrieval in a professional setting.
STS Paper
In my STS research, I explore the growing issue of bias in NLP systems, as the technology continues to proliferate into applications affecting and used by real people. The paper discusses how biases are embedded into NLP systems through the data they are trained on, how they are coded, and their interactions with the end user. Additionally, the paper outlines how protected classes of society such as race, gender, and religion are disproportionately affected by these biases. Finally, the paper both reviews and analyzes proposed ways of bias mitigation found in current academic literature. The exploration concludes that bias mitigation will only be successful if a multifaceted and multidisciplinary approach is taken that includes the synthesis of ethical frameworks that can be embedded into the development lifecycle of these applications.
Conclusion
The exploration of NLP technologies in both practical application and societal context provides a comprehensive view of the technologies’ capabilities and limitations. Learning hands-on about the technological and practical capabilities of NLP in the real world, while also researching the ethical dimensions and societal impacts of said technology, allowed me to enrich my perspective on the broader implication of technology in society. While highly influential companies such as Facebook have in the past encouraged “moving fast and breaking things,” this exercise has enumerated the current and future need for a more societally accountable approach to technological development.
BS (Bachelor of Science)
Natural Language Processing, NLP, Bias
School of Engineering and Applied Science
Bachelor of Science in Computer Science
Technical Advisor: Rosanne Vrugtman
STS Advisor: Richard Jacques
Technical Team Members: Akiva Miller, Sebastian Thasan
English
All rights reserved (no additional license for public reuse)
2024/05/04