Using Image Processing to Automate Data Collection from Wordle Screenshots; Working Towards Just Surveillance: Lessons and Implications from Google and Apple Exposure Notifications

Author:
Wang, Benjamin, School of Engineering and Applied Science, University of Virginia
Advisors:
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia
Morrison, Briana, EN-Comp Science Dept, University of Virginia
Wylie, Caitlin, EN-Engineering and Society, University of Virginia
Abstract:

The overarching problem encapsulating both my technical and STS papers is the improper integration of computer-vision technologies within our societal systems. My technical report addresses the practical application of computer-vision; how developers can integrate publicly available tools to help broaden the scope of data within the medium of visual-text data analysis through examples of my own development process. On the other hand, my STS report addresses the social/ethical application of surveillance, questioning how governments and large corporations can integrate computer-vision and related surveillance technologies ethically without violating the rights and trust of its constituents. This overall problem surrounding the integration of computer-vision technologies is important as innovations within the field have increasingly popularized the usage of these tools for applications in surveillance and data collection, and continued improper integration of these technologies will only exacerbate current negative social outcomes present within these areas, like rights abuse, privacy violations, and the overall negative perception and distrust for the field.
My technical report attempts to solve the problem of unsustainable workloads and lack of scalability within the data collection process, particularly surrounding a data-analysis project with the New York Times’ game, Wordle. To tackle this problem, I used several publicly available computer-vision technologies to create an automated tool that extracts structured game data from Wordle screenshots. The system’s overall design was heavily influenced by license plate recognition systems, as though they come from different social contexts, they share almost identical problem frames; thus the solution for one could be the solution for the other. In the end, my technical paper found two main outcomes within the realms of computer-vision applications. First, it highlighted a general pipeline for how computer-vision technologies can be used as a tool for visual-text parsers, demonstrating many common problems and possible solutions developers may face (like image quality variability and accuracy optimization). Second, my tool has helped broaden the scope of Wordle data analysis through opening the possibility of screenshots as data points, as research prior had previously been limited to data gathering through web scraping techniques alone, offering a scalable, practical process for data collection in Wordle screenshots that was not present in the field before.
My STS report revolved around the current problem within corporations and governments, and their struggle to ethically integrate powerful surveillance technologies without violating the trust and rights of the observed. In my report, I conducted an ethical analysis on the Google and Apple Exposure Notification (GAEN) system, scrutinizing both its design and legal implementation to determine whether, under the framework of Just Surveillance, it was an ethical surveillance technology. Using this system as a case study, I also argued the benefits of Just Surveillance as the potential foundation for a normative ethical framework that encapsulates consensual surveillance, despite only being designed for non-consensual circumstances. My findings demonstrated the strengths and weaknesses of GAEN as an ethical surveillance system. It found particular strengths within its cause, intentions, and attentive design, which sought to eliminate many areas of potential abuse. However, it also exhibited weaknesses, particularly within its distribution, its inherent dependence on scale for success, and its lack of accountability for how distributors should achieve consent. This report also highlighted the points within the revised Just Surveillance framework that are lacking, mainly its ambiguity when describing the faults for proper authority and achieving consent.
While my papers did not provide concrete solutions to the general problem surrounding the integration of computer-vision and surveillance, they nonetheless were successful in providing stepping stones in which future research can be iterated upon. My technical report, while somewhat small-scoped being tied to Wordle, demonstrated a general process future developers can draw upon when using computer-vision that can be applied to automate data collection of other visual-text mediums outside of Wordle. My STS report, while providing an imperfect revision for the framework of Just-Surveillance, provides a strong argument for Just-Surveillance as a foundation for a normative framework encapsulating both consensual and non-consensual surveillance; its imperfections thus serve as starting points for future work to refine, iterate, and argue upon.
I would like to thank my professors for their guidance throughout the entire process for both the technical and social papers, as I truly would not have been able to complete these papers without their guidance. Specifically, I would like to thank Rosanne Vrugtman and Briana Morrison for their help and advice for the technical paper, and Caitlin Wylie for her help and insight with the STS paper.

Degree:
BS (Bachelor of Science)
Keywords:
Wordle, Image Processing, Computer Vision, Surveillance, Just Surveillance, Ethical Surveillance, Surveillance Ethics
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Rosanne Vrugtman, Briana Morrison

STS Advisor: Caitlin Wylie

Technical Team Members:

Language:
English
Issued Date:
2025/05/05