Abstract
Text-to-video (T2V) generative artificial intelligence has rapidly emerged as one of the most significant developments in the current generative AI era. By allowing users to create realistic video clips from short written prompts, these systems promise new possibilities for storytelling, design, entertainment, and media production. However, at the same time, they intensify a broader public problem: how society can preserve trust, accountability, and responsible media-use in an environment where realistic synthetic content is increasingly easy to produce and distribute. In its current form, T2V AI sits on a thin line between technical innovation and social instability. The same factors that make these systems attractive, such as realism, speed, accessibility, and overall scale, also enable dangerous/ malicious outcomes that affect public trust, welfare, and society. As my STS research shows, these dangers include misinformation, identity misuse, copyright conflict, labor disruption, environmental cost, and the erosion of confidence in visual evidence. My technical report and STS research paper are coupled because both address different parts of this same general problem. The STS paper explains why T2V AI has become such a serious sociotechnical concern, while the technical report proposes a practical solution: a web-based detector, VidGuard, designed to help users assess whether a video is likely AI-generated or authentic, and to encourage stronger norms of accountability around these synthetic media. My technical report, “Designing VidGuard: A Web-Based Text-to-Video AI Detector with an Ethical Framework for Responsible AI Use,” proposes a detection and verification system aimed at addressing one of the challenges posed by T2V AI: the growing difficulty of identifying synthetic video content once it begins circulating online. The rationale for the project is that 1 false/ misleading video content can spread quickly across digital platforms, especially when it appears realistic enough for most viewers and able to pass as an authentic video/ evidence. VidGuard is proposed as a web-based tool that users could upload a video or submit a link so that the system can evaluate whether the content is actually AI-generated. The design combines several features, including scanning for visual characteristics associated with AI generation, checking for authenticity or provenance markers like watermarks and content credentials, and returning an assessment of how realistic or authentic the video appears to be. The report also includes an ethical framework that is intended to guide how such a tool should be designed and used. This framework emphasizes transparency, privacy, responsible communication of results, and awareness of the limits of this automated detection technology. The report concludes that a detector alone cannot solve the larger T2V problem, but that it can serve as one accountability tool within a broad framework of verification systems, ethical standards, and platform responsibility. My STS research paper, “Seeing is No Longer Believing: The Sociotechnical Dangers of Text to-Video Generative AI,” investigates how T2V systems show the mutual shaping of technology and society and asks why their sociotechnical risks appear to outweigh their social benefits in their current form. The paper argues that the harms of T2V AI does not come from the technology itself, or from society alone, but rather from the interaction between powerful video generation systems, user behavior, platform incentives, weak detection norms, legal uncertainty, and a public information environment already strained by misinformation. To study this problem, I collected and synthesized evidence from academic literature, official technical and governance documents, company safety materials, industry commentary, and even high-credibility public reporting. I organized this evidence around recurring themes, such as misinformation and trust, 2 copyright and identity misuse, labor and creativity, environmental cost, and technical mitigation. Using Actor-Network Theory as the main conceptual framework, I analyzed T2V not as an isolated tool, but rather as a sociotechnical network involving models, developers, platforms, users, journalists, audiences, laws, and verification standards. My findings show that misinformation and reality confusion are central risks rather than side effects, copyright/ trademark and identity misuse are deeply involved with the rise of T2V, creative labor faces growing pressure from the large scale of these generative models, and that the mitigation efforts such as watermarking/ provenance standards remain incomplete because they depend on a broader adoption, platform cooperation, and public awareness. The paper concludes that although T2V has real creative and commercial benefits, those benefits are currently outweighed by the combined effects of misinformation, weakened public trust, legal uncertainty, labor disruption, and identity-related harms. Taken together, these two projects contribute to understanding and responding to the rise of T2V generative AI. The STS paper explains why T2V should be understood as a sociotechnical system. Its harm is built by its interactions with society. The technical report proposes a practical tool that could intervene in that system by improving media verification and encouraging stronger accountability norms. I don’t argue that one detector/ ethical framework can solve a large-scale problem like this. But my work does show that if T2V becomes socially beneficial in the future, it will require more than just technical progress alone. It will require stronger systems of transparency, ethical design, provenance, public awareness, and institutional responsibility.