Structured Interpretable Manipulation of Policies; Framing Public Policy for Adversarial Machine Learning

Author: ORCID icon
Lee, Jihyeong, School of Engineering and Applied Science, University of Virginia
Foley, Rider, EN-Engineering and Society, University of Virginia
Wang, Hongning, EN-Comp Science Dept, University of Virginia

With the increased prevalence of machine learning and artificial intelligence, new questions about the security and reliability of these technologies arise. Specifically, adversarial machine learning attempts to attack these algorithms into performing undesirable tasks by exploiting our nascent understanding of these algorithms. This project explores the interpretability of adversarial attacks in reinforcement learning (RL) environments. A key feature of adversarial attacks pertains to the limited magnitude of an attack being able to have a significant impact on the output of the target. This is closely tied to model interpretability which involves understanding model outputs and why such a small change in the input is able to completely change the output of a DNN. This naturally leads to a question of interpretability of adversarial attacks. In this research, we explore a technique used in a classification setting for improving group sparsity of adversarial attacks in an RL setting. Additionally, we adapt methods used to measure attack interpretability in a classification setting to the RL setting where the objective is to maximize a cumulative reward.
It is also important to consider the social implications of adversarial attacks on the adoption of machine learning. Although most of these attacks are currently theoretical, they can pose a major threat to machine learning systems in the future. Public sentiment about machine learning will be adversely impacted by adversarial attacks, unless regulatory measures are put in place to guarantee a minimum bar of safety. However, since this technology is so new, there are many questions on whether current cybersecurity legislation or liability legislation can specifically address this issue. Thus, I will employ the Actor Network Theory framework to analyze how effective current regulatory measures are in dealing with adversarial machine learning, as well as the Anticipatory Governance framework in order to analyze how governments should proactively address the oncoming threat. Specifically, I will analyze current United States legislation regarding cybersecurity and its ability to address the issue at hand. Furthermore, I will analyze the future plans of three major governments that are increasing their focus on machine learning: the United States, the European Union, and South Korea to examine their emphasis on this issue specifically, as well as the reasoning behind their focus or lack of focus. I will also conduct interviews with industry professionals in machine learning to gather potential areas for policy research. I expect to find insight into examining the differing perspectives of regulating machine learning, and see what policy changes should be made for a public that is looking to embrace machine learning. In concert, our technical research alongside our sociological research will allow us to better understand how to adopt a completely novel technology that has the impact to greatly support human productivity.

BS (Bachelor of Science)
machine learning, adversarial attack, reinforcement learning, anticipatory governance, actor network theory

School of Engineering and Applied Sciences

Bachelor of Science in Computer Science

Technical Advisor: Hongning Wang

STS Advisor: Rider Foley

Technical Team Members: Quinlan Dawkins

Issued Date: