Abstract
The internet has democratized information at an unseen scale, distributing knowledge and power to any curious individual across the world. This attribute is fundamental to my respect for software engineering, and its potential violation is my motivation for exploring the problem of data privacy. The modern web churns out user data, and whether this serves to create better products and services, or to invasively track and extort customers, depends entirely on what, how, and by whom the data is collected. So, while my engineering education teaches me to build efficient technical systems given constraints, sociotechnical analysis provides me the tools to ask whether those frameworks are ethical and effective. My technical project is a personalized feature recommendation system built during my internship at Amazon Web Services, using large-scale simple browsing data to help users navigate a complex platform easier and faster. My STS research examines why consent-based privacy regulation fails to give users meaningful control over data collection.
My technical project produced a personalized feature recommendation system for the AWS Console. AWS offers over 200 cloud services, each with hundreds of features, and most users move through them using workflows that are slower or less effective than they need to be, not because better options do not exist, but because they do not know those options are there. My system addresses this by combining a graph knowledge base that captures navigation patterns from anonymized browsing logs, a vector knowledge base that encodes feature documentation for semantic search, and a large language model that synthesizes both with a user’s current session to generate ranked recommendations. On held-out session data, the system achieved a success rate above 80 percent, meaning at least one of five recommended features matched a feature the user actually navigated to next.
My STS research asked whether the legal tools designed to give users control over their privacy actually work. Cookie banners, opt-out checkboxes, and consent dialogs are the mechanisms regulators in the United States and Europe have put in place to govern behavioral data collection. My research concluded that these mechanisms do not meaningfully protect users. They are structurally incapable of communicating what users would need to know to make an informed decision: the extent of data being collected, all potential future consumers, and the purpose for each collection. The technical reality of modern data flows is too complex to compress into a banner with an accept button, or to be regularly read and understood by users. Furthermore, putting the onus of consent collection on the companies whose revenue depends on maximizing data collection leads to conflicting interests and repeated undesirable outcomes. The result is a system designed to produce the appearance of privacy without the substance of it. Meaningful privacy protection, my research argues, requires moving away from consent entirely, toward regulation that evaluates data flows against the norms of the situations in which they occur.
While my projects contrast each other in perspective on the same issue, one as the engineer benefitting from data collection, and the other a critic of current data privacy methodology’s sociotechnical impact, they work to deepen my understanding of how non-technical thinking and moral analysis play into ethical professional engineering work. My technical system respects users’ contextual privacy, using non-invasive and minimal browsing data with the intention to assist users. But the line between a tool that serves users and one that surveils them is not drawn using the tool itself. It is drawn over and over by engineers’ design choices and by the regulatory systems that create and constrain these decisions. A steamroller driven by a gardener will never plant flowers, the tool’s functionality is imbued in its form. My STS research analyzed the effectiveness and ethicality of regulatory systems based on the technological structures they led to and the non-technical impacts of them, while I directly engaged with real-world responsible data use in my technical work.