Machine Learning Approaches to Multi-Agent Inverse Learning Problems

Lin, Xiaomin, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, Department of Systems and Information Engineering, University of Virginia

The problem to infer the goals of an agent on the basis of the observation of its actions has been framed in the context of inverse reinforcement learning (IRL) and has been extensively studied in recent decades. However, this model is valid only when no other adaptive agents exist or their interference can be neglected. Otherwise, a new model taking other agents into account needs to be created in place of IRL. To this end, this dissertation proposes a multi-agent inverse reinforcement learning (MIRL) model, using the framework of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We develop algorithms for two fundamental classes of MIRL problems: two-agent zero-sum and two-agent general-sum. For the first class, we develop a Bayesian solution approach in which the generative model is based on an assumption that the two agents follow a minimax bi-policy. For the second, we consider five variants: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. The uNE-MIRL is similar to uCE-MIRL in total game value maximization but a Nash equilibrium is assumed to employ. The advE-MIRL and cooE-MIRL problems assume agents constitute an adversarial equilibrium and coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurately estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. We demonstrate these algorithms on multiple grid-world experiments, concluding: 1) all these algorithms are able to recover high-quality rewards comparable to ground truths; 2) perform better than other methods, such as decentralized-MIRL and IRL.

PHD (Doctor of Philosophy)
Artificial Intelligence, Multi-agent Systems, Inverse Reinforcement Learning, Game Theory, Machine Learning
Issued Date: