Internship at Amazon: The Importance of Compressibility in the Cloud; Virtuous Balance: Analyzing the Ethical Intersection of Innovation and Privacy in Data Collection

Kweon, Daniel, School of Engineering and Applied Science, University of Virginia
Neeley, Kathryn, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia

Balancing Innovation in Data Compression with Ethical Privacy Concerns

“Once a new technology rolls over you, if you're not part of the steamroller, you're part of the road.”

- Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987

My interest in the topic of innovation and privacy in data collection began when I worked on a data compression project during my internship at Amazon. During my time there, I was part of a team working on creating a new compression algorithm aimed at compressing files with
faster speeds and less memory usage. This project involved testing with large amounts of user data to analyze and optimize the performance of the algorithm. After working on such a data-centric project for a large corporation like Amazon, I began to wonder about the safety concerns of using large data sets from real people for innovation and development of new products like these compression algorithms. On one hand, I saw the necessity of using user data to innovate, but on the other hand, I was concerned about the privacy issues, especially with the prevalence of data leaks nowadays. This led me to choose my STS research paper topic, as I wanted to conduct a more in-depth analysis of the balance between innovation and privacy in data collection.

The goal of my technical project was to produce a benchmarking tool for a new compression algorithm for Amazon's Simple Storage Service (S3), aiming for higher efficiency in data management. For a long time, Amazon relied on third-party compression algorithms to compress their files, but the aim of this project was to develop a new algorithm by Amazon, ultimately compressing files with better speeds and less memory usage than the existing

Figure 1 better explains the problem that Amazon was addressing. The compression ratio is the original size compared to the compressed size, measured in unitless data as a size ratio of 1.0 or greater, while speed is how quickly it performs the compression. For most of the compression algorithms, a better ratio meant a slower algorithm. Initial results showed that Amazon’s algorithm would use four times less memory while being able to compress almost ten times as fast. However, most of the files showing these results were very large, usually around a couple of gigabytes. Amazon’s algorithm struggled quite a bit when trying to compress very small files, taking twice as long to compress these. Smaller file sizes mean less repetition in the file itself (compression relies on repetition in the file to compress more efficiently), which makes them harder to compress. As of now, this algorithm is not in production, but if tuned to better work with smaller files, it could be a huge milestone, allowing users to save money as they store files in S3. Amazon could also save resources and money they must spend to keep files on the cloud. In my STS research, I delved deeper into the ethical dilemmas posed by my technical work. I explored the complex relationship between the necessity of using user data for innovation and the rising concerns about privacy, especially in the context of frequent data leaks.

My research focused on understanding the balance between technological advancement and the protection of individual privacy in data collection. It highlighted the dual nature of innovation - as a driver for enhanced user experiences and technological progress, and as a potential threat to privacy. My research emphasized the need for ethical guidelines and corporate responsibility in handling user data, advocating for a harmonious balance between innovation and privacy in the ever-evolving digital landscape.

Reflecting on this synthesis from an STS perspective, it becomes evident that engineering is not just a technical endeavor but a sociotechnical one, deeply intertwined with organizational and cultural elements. By considering these facets simultaneously, we gain a more holistic understanding of the implications of our work. This approach underscores the importance of ethical responsibility in engineering, emphasizing that innovation should not only focus on technical efficiency but also on its societal impact, particularly in terms of privacy and data security. This synthesis demonstrates how STS perspectives can guide engineers to create technology that is not only advanced but also ethically sound and socially responsible, ensuring that progress in engineering contributes positively to society as a whole. That is why I began this paper with a quote from Stewart Brand: we, as engineers, bear the responsibility of controlling the technology at our disposal, lest it steamrolls over us with devastating consequences. Through this understanding, I hope to become a leader in technology, ensuring that its ethical and moral standards are upheld.

BS (Bachelor of Science)
Virtue Ethics, Innovation, Privacy

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Rosanne Vrugtman

STS Advisor: Kathryn A. Neeley

All rights reserved (no additional license for public reuse)
Issued Date: