Defending Generative Models through Privacy Auditing and Content Detection

Pang, Yan

Defending Generative Models through Privacy Auditing and Content Detection 106 views

Author

Pang, Yan, Computer Science - School of Engineering and Applied Science, University of Virginia

Advisors

Wang, Tianhao , EN-Comp Science Dept , University of Virginia

Abstract

Generative models for images, video, and text have achieved unprecedented capabilities, driven not only by advances in model architectures but also by training on massive, high-quality datasets. While such data and the resulting generative power underpin impressive performance, they also introduce significant privacy and security risks. This dissertation focuses on these risks from both the training-data and content-generation perspectives, and develops auditing and defense mechanisms across multiple modalities.

We first focus on the privacy of training data and design membership inference attacks (MIAs) tailored to diffusion models. In the white-box setting, we introduce gradient-based MIAs that can be used as auditing tools, enabling data owners such as artists to check whether their works have been used without consent in a model’s training set. We then extend this auditing capability to the black-box setting by designing score-based MIAs that operate using only model outputs, making copyright verification feasible even when internal model details are unavailable.

Next, we turn to the risks that arise from the strong generative capabilities of advanced models: how to distinguish AI-generated content from real content, and how to identify and mitigate harmful generations. For video generation models, we propose VGMShield, an integrated framework that detects AI-generated videos, traces them back to their source models, and perturbs benign inputs to prevent realistic forgeries. Complementing this, we conduct a systematic study of unsafe video generation, construct the first dataset of unsafe videos produced by state-of-the-art open-sourced VGMs, and propose a latent-variable defense that utilizes intermediate outputs to efficiently block unsafe outputs.

Finally, in the text domain, we focus on the abuse of large language models for phishing generation and present Paladin, a trigger–tag paradigm that builds instrumented LLMs to embed robust, stealthy tags into phishing emails, enabling scalable detection of AI-generated phishing content without degrading normal utility.

Together, these contributions offer a unified, cross-modal view of privacy and safety in generative AI, highlighting both the risks introduced by modern generative models and practical mechanisms to audit and mitigate their misuse.

Degree

PHD (Doctor of Philosophy)

Keywords

Generative AI Security; Membership Inference Attack; Diffusion Models; AI-Generated Content Detection; Video Generation Model Defense; LLM Phishing Detection; Trustworthy Machine Learning; Copyright Auditing; Trigger–Tag Watermarking

Language

English

Rights

Issued Date

2026-05-22

Persistent Link

https://doi.org/10.18130/yqd4-5408

Suggested Citation

Pang, Yan. Defending Generative Models through Privacy Auditing and Content Detection. University of Virginia, Computer Science - School of Engineering and Applied Science, PHD (Doctor of Philosophy), 2026-05-22, https://doi.org/10.18130/yqd4-5408.

Files

2_Pang_Yan_2026_PHD.pdf

Downloads: 18

Download