Abstract
My portfolio connects a technical report based on my summer internship project with an STS research paper on AI voice cloning in music through a shared concern with classification and control. In my technical project, I worked on an AI-based financial document classification and data extraction pipeline for a mortgage company, where inconsistent document formats made fraud detection and data validation difficult to meet organizational needs. In my STS project, I studied how U.S. legal institutions are classifying AI voice cloning categories like training, authorship, and voice rights, even though the technology overlaps all three. I was drawn to this topic, as music is a personal hobby of mine, and also because it connected back to a larger engineering question from my technical work: when systems sort and classify real-world information, they affect what gets recognized, what gets valued, and what gets ignored. This is why STS matters to engineering practice, especially in computer science, as technical systems do more than just process information and data; they also organize decisions, protections, and power. The technical portion of my portfolio is a brief report of my experience working on a financial document classification and data extraction project. At the company, a lot of financial data was found in varying unstructured documents with no centralized location to find and use it. Furthermore, documents were manually classified, thus creating inefficiencies. In my report, I explain how these inconsistent spreadsheets and PDFs created problems for downstream data validation and fraud detection, and how the project addressed that issue by standardizing similar documents into a unified structure for downstream analysis. The system used a modular pipeline with metadata-based routing, regex and keyword pattern matching, LLM support for ambiguous files, and external configuration support, which made this system overall more robust, and more flexible and maintainable for future use cases than the previous system. The significance of this work is that it standardized over 17 different rental income formats across 30,000+ documents, expanded data extraction from 5 fields to 25+, and improved document classification accuracy to 94%. This created cleaner and more comparable data for fraud detection workflows while reducing manual effort needed for classification and future updates. In my STS research, I analyze AI voice cloning in music through Sheila Jasanoff’s concept of sociotechnical imaginaries to understand how U.S. legal institutions are responding to this technology. I used a case and document analysis approach, focusing on sources addressing issues like model training, authorship, and voice rights. I grouped court complaints, U.S. Copyright Office reports, and legal commentary based on how they framed harm, ownership, and protection, and then compared the patterns across these sources. This process helped me see that institutions do not treat AI voice cloning as one unified issue, but instead divide it into separate legal categories, even though this technology overlaps all of them in practice. By comparing these patterns, I show that current protections and legal responses more clearly favor copying, licensing, and market interests than artist consent and vocal identity, thus favoring corporate control and market accommodation over artist autonomy. These projects together showed me the value of treating engineering problems as sociotechnical systems rather than just purely technical ones. Jasanoff’s concept of sociotechnical imaginaries, a major idea from STS I took away, showed me that technologies do not work independently, they are shaped by institutions, values, and forms of governance. My technical project taught me that even a robust document pipeline depends on organizational definitions of accuracy, fraud risk, and usable data, as those standards determine what the system is built to recognize and extract. My STS research extended that idea by showing that legal institutions are classifying AI voice cloning in ways that prioritize some harms over others, especially licensing harms over consent and artistic identity. Completing both projects helped me see a stronger ethical lesson, that classification is never neutral. Whether engineers are organizing financial documents or building AI systems that interact with creative work, the categories they rely on can protect some while overlooking others. STS therefore strengthens ethical responsibility in engineering by pushing engineers like me to question the functionality of a system, and also whose interests are built into, and whose concerns are excluded from a system.