Enhanced Document Retrieval with AI-Integrated Python Program; Automating Knowledge Access: The Benefits and Challenges of AI-Powered Document Retrieval in Corporate Environments

Author:
Andersen, Dylan, School of Engineering and Applied Science, University of Virginia
Advisors:
Morrison, Briana, EN-Comp Science Dept, University of Virginia
Earle, Joshua, EN-Engineering and Society, University of Virginia
Abstract:

At ME Engineers, employees were frequently delayed in their workflows due to inefficient access to company documentation. Junior employees would often rely on senior staff or time-consuming manual searches through technical books and internal documents. To address this bottleneck, I developed an AI-powered document retrieval system using a Python backend that connects company documents to a Pinecone vector database. This system incorporates ChatGPT for natural language processing and query formulation, enabling semantic search over the company’s technical resources.

The retrieval system includes a user-friendly frontend built in Streamlit, allowing users to ask natural language questions and receive relevant documents alongside AI-generated responses. A key feature of the system is its document cleaning pipeline, which preprocesses unstructured PDFs to ensure high-quality input for vector embedding and retrieval. The system interprets employee queries, transforms them into vector embeddings, and matches them against stored document vectors for efficient and accurate information access.

This solution significantly reduces the time employees spend retrieving information, increasing productivity and enabling more autonomous workflows. It democratizes information access across technical and non-technical employees alike, and allows for faster, more reliable decision-making. Future improvements will include supporting additional file formats and enhancing system scalability to accommodate growing document libraries. This project demonstrates how applied machine learning and vector search can address real-world organizational inefficiencies and optimize information workflows in technical environments.

As organizations grow increasingly reliant on digital documentation, the retrieval of institutional knowledge has become a pressing challenge. AI-powered document retrieval systems, particularly those leveraging semantic search and natural language processing (NLP), promise to alleviate inefficiencies by reducing time spent manually searching for information. However, these systems also raise complex social, ethical, and organizational issues that must be critically examined.

I explored the research question: What are the anticipated benefits and challenges of implementing an AI-powered document retrieval system in corporate environments, based on existing research and case studies? Drawing on infrastructure studies (Star, 1999), I argue that AI systems are not neutral tools but sociotechnical infrastructures that reshape workplace dynamics, redefine access to knowledge, and carry embedded political values. While such systems can democratize access by reducing reliance on knowledge gatekeepers, they can also disrupt hierarchies, create resistance among employees, and reinforce biases present in training data.

Using case studies from Google, Hyntelo, and the Mayo Clinic, I illustrate how AI-powered search improves information flow and efficiency but also faces challenges related to bias, transparency, and trust. These challenges highlight the need for responsible AI deployment that emphasizes explainability, fairness, and user training. Ultimately, this research underscores that successful implementation of AI-powered document retrieval depends not just on technical performance, but on thoughtful integration into existing organizational structures and values. AI must be designed to augment, not replace, human expertise.

The technical and STS components of this project are deeply intertwined. Technically, the project addresses the immediate problem of inefficient information access in engineering environments by building an AI-powered document retrieval system. This system uses advanced NLP techniques and vector search to provide more effective access to documentation through a user-friendly interface. The system reduces time spent on repetitive tasks and minimizes dependency on senior colleagues for knowledge sharing.

The STS research contextualizes this technical solution within broader sociotechnical frameworks. It critically examines the implications of AI-powered retrieval systems beyond productivity metrics. Drawing on infrastructure studies, the research highlights how these systems become embedded in organizational routines, influence power dynamics, and shift the way knowledge is accessed and controlled.

The technical project benefits from the STS perspective by recognizing that technology does not exist in a vacuum. The design and deployment of the AI system must consider user trust, explainability, bias, and integration with existing workflows. Likewise, the STS research is grounded in the lived challenges addressed by the technical project. This provides a concrete example of how AI technologies interact with social systems in real-world settings.

Together, the technical and STS components offer a holistic view of AI-powered document retrieval. The technical work delivers a functional tool to improve productivity, while the STS analysis ensures that its implementation aligns with ethical, organizational, and social considerations. This pairing illustrates how interdisciplinary thinking is essential for the responsible development and integration of emerging technologies in the workplace.

Degree:
BS (Bachelor of Science)
Keywords:
AI-Powered Search Tools, Semantic Search, Vector Databases, Corporate Knowledge Access
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Briana Morrison

STS Advisor: Joshua Earle

Technical Team Members: N/A

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2025/05/05