Tool Vulnerabilities: Flagging LLM Injection Vulnerabilities; Black-Box: A Deep Dive Into Governance and Data Privacy with LLMs

Shek, Winston

Tool Vulnerabilities: Flagging LLM Injection Vulnerabilities; Black-Box: A Deep Dive Into Governance and Data Privacy with LLMs 4 views

Author

Shek, Winston, School of Engineering and Applied Science, University of Virginia

Advisors

Murray, Sean , EN-Engineering and Society , University of Virginia
Vrugtman, Rosanne , EN-Comp Science Dept , University of Virginia

Abstract

What if ChatGPT gave you access on how to create a nuclear bomb? The rise of artificial intelligence (A.I.) in all facets creates challenges from a technical and governance standpoint. As large language models (LLMs) advance in capability, models are forced to ward off attacks to manipulate its functionality. These capabilities warrant a deep dive into whether current governance can handle the increased responsibilities of LLMs. 

My capstone research addresses increasingly skilled large language models (LLMs) that have access to more data via tools that help them access the internet, sensitive data, or make decisions for users. LLMs can be manipulated via prompt injection attacks that hide malicious content in user input or external context.  Prompt injection attacks happen because most large language models (LLMs) fail to distinguish between trusted commands and untrusted data. 

I propose a unified framework that singles out key success metrics and enables consistent evaluation capabilities for evaluating model vulnerability across model-tool combinations. This benchmark specifically focuses on which attack vectors will be the most likely to fail under an attack. Three tool categories form the basis of testing: API access, code execution, and web search. Each of these tool categories represent different levels of permission access, and a wide range of model-agnostic attack templates test how permissions affect prompt injection success. Attack templates are specific pre-formatted prompts or multi-turn prompts that try and prompt inject the model. Templates focus on three levels of sophistication to test the vulnerability of a model. These attacks will be repeated to ensure effectiveness. To evaluate the effectiveness of the framework, the primary metric is attack success rate (ASR) to determine how successful the attacks are. Secondary metrics include the degradation rate which determines how quickly the model degrades after repeated attempts to jailbreak it. 

These concerns into how large language models (LLMs) might be manipulated place into focus how LLMs handle data. 

My STS paper directly analyzes data privacy gaps in two primary areas: model development and model deployment. I interweave the current regulatory framework across several countries to provide a comprehensive outlook into where gaps in governance appear in these two areas. This paper primarily utilizes discourse analysis to use literature as a tool to examine the current regulatory framework and how key stakeholders view artificial intelligence. I build upon this with actor network theory (ANT) to visualize where gaps materialize within the complex web of stakeholders, and use an A.I. ethical framework to show how these gaps should be framed. My paper argues for a shift in governance by employing more transparent and detailed disclosure measures across the board in deployment and training, in addition to transitioning regulation to trusted third-party auditors to ensure interoperability between countries and model providers.

Both of these papers aim to strengthen the ecosystem surrounding privacy and A.I. As models improve, it becomes imperative to ensure that models are safe to use for the public, while ensuring that an individual’s right to data privacy is respected.

Degree

BS (Bachelor of Science)

Keywords

LLMs; Artificial Intelligence; Prompt Injection; AI Governance; AI Policy

Notes

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Rosanne Vrugtman

STS Advisor: Sean Murray

Language

English

Rights

Issued Date

2026-05-12

Persistent Link

https://doi.org/10.18130/0y6c-pd55

Suggested Citation

Shek, Winston. Tool Vulnerabilities: Flagging LLM Injection Vulnerabilities; Black-Box: A Deep Dive Into Governance and Data Privacy with LLMs. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-12, https://doi.org/10.18130/0y6c-pd55.

Files

This item is restricted to UVA until 2031-05-12.