Abstract
Generative models for images, video, and text have achieved unprecedented capabilities, driven not only by advances in model architectures but also by training on massive, high-quality datasets. While such data and the resulting generative power underpin impressive performance, they also introduce significant privacy and security risks. This dissertation focuses on these risks from both the training-data and content-generation perspectives, and develops auditing and defense mechanisms across multiple modalities.
We first focus on the privacy of training data and design membership inference attacks (MIAs) tailored to diffusion models. In the white-box setting, we introduce gradient-based MIAs that can be used as auditing tools, enabling data owners such as artists to check whether their works have been used without consent in a model’s training set. We then extend this auditing capability to the black-box setting by designing score-based MIAs that operate using only model outputs, making copyright verification feasible even when internal model details are unavailable.
Next, we turn to the risks that arise from the strong generative capabilities of advanced models: how to distinguish AI-generated content from real content, and how to identify and mitigate harmful generations. For video generation models, we propose VGMShield, an integrated framework that detects AI-generated videos, traces them back to their source models, and perturbs benign inputs to prevent realistic forgeries. Complementing this, we conduct a systematic study of unsafe video generation, construct the first dataset of unsafe videos produced by state-of-the-art open-sourced VGMs, and propose a latent-variable defense that utilizes intermediate outputs to efficiently block unsafe outputs.
Finally, in the text domain, we focus on the abuse of large language models for phishing generation and present Paladin, a trigger–tag paradigm that builds instrumented LLMs to embed robust, stealthy tags into phishing emails, enabling scalable detection of AI-generated phishing content without degrading normal utility.
Together, these contributions offer a unified, cross-modal view of privacy and safety in generative AI, highlighting both the risks introduced by modern generative models and practical mechanisms to audit and mitigate their misuse.