Abstract
We live in an age of abundant data, where the systems processing our information and the systems shaping our attention grow ever more sophisticated. These two realities, the engineering challenge of managing data at scale and the social challenge of governing the algorithms that threaten to govern us, are not separate conversations. They are two sides of the same technological moment, and this capstone project lives at their intersection.
Modern computing infrastructure faces a fundamental tradeoff: as datasets grow, the memory needed to store them increases alongside it. However, due to the increasing reliance on data as a source for improving products, datasets have outpaced the hardware used to store them. My technical report looks at some solutions to this challenge by surveying probabilistic data structures, such as Bloom Filters, Cuckoo Filters, Count-Min Sketch, and HyperLogLog. These structures trade a small, bounded probability of error for dramatic gains in speed and memory efficiency, These structures underpin much of the invisible infrastructure of the internet, including the recommendation pipelines that power platforms like TikTok. Understanding how these data structures work allows us to understand how we can store all of the data that attention algorithms, like TikToks, use to operate at the scale they do.
And that scale demands social scrutiny. TikTok's recommendation algorithm can process billions of behavioral signals in milliseconds, achieving something qualitatively new: a platform that does not merely deliver content but engineers each user's experience in real time. My STS research examines this through a historical lens, tracing potentially damaging entertainment and regulatory responses across radio, television, and video games, before arguing that TikTok represents a departure from these older forms of media. Its architecture, adolescent neurodevelopmental vulnerability, and engagement-driven business incentives form a tightly coupled network in which platform harm is not incidental but structural.
Together, these projects argue that technical sophistication is never neutral. The same efficiency that makes probabilistic data structures elegant also enables attention economies operating at unprecedented personalization and scale. Engineers who build these systems bear a responsibility that is not merely technical but also ethical, as they must push technology forwards, while making sure it does not encroach on the rights and development of people. Only by understanding both the technical roots of the solutions, as well as the societal implications of the technology we create, can engineers push the field forwards while protecting the boundary between humans and that very same technology.