Machine Approaches to Epistemic Learning with Application to Decision Strategy Recognition
Qiao, Qifeng, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, Department of Systems and Information Engineering, University of Virginia
In sequential decision problems, the forward planning problem is one in which an agent seeks to find a decision strategy that yields a sequence of actions that achieve a given goal. The inverse process, which is sometimes called plan or goal recognition, is to infer the goals of an agent on the basis of observations of its actions. One of the more general goal recognition problems may be framed in the context of reinforcement learning. The inverse reinforcement learning problem (IRL) is to determine the reward structure perceived by an agent on the basis of observations of its actions. A number of approaches have been proposed for IRL, but many rely on the assumption that the reward function can be linearly approximated, which may not be reasonable for many problems of practical interest. The ill-posed nature of the inverse learning problem also presents difficulties. Multiple reward functions may yield the same optimal policy, and there may be multiple observations at a state given the true reward function.
The focus of this dissertation is the design and analysis of algorithms for IRL to address several of the principal limitations of existing methods and to extend the domains of application for IRL. Specific areas of contribution include: (1) development of a non-parametric probabilistic model in a Bayesian framework that quantifies inference uncertainty and provides a prediction model in closed form, (2) introduction of the idea of using reward functions recovered through IRL as a signature for decision agents in supervised or unsupervised learning of agent identity and decision strategy, (3) development of algorithms that can learn from supervised information from observed states and unsupervised information from unobserved states. IRL algorithms can present many computational difficulties. We prove that the principal computational procedures in IRL under a Gaussian process model can be formulated as convex optimization problems, and thus solved using efficient algorithms. We also develop a minorization-majorization approach to the computation of likelihood functions. For a variety of standard test problems, we present empirical results for experiments comparing our proposed methods with algorithms from the recent literature.
The study of behavior recognition, based on inverse reinforcement leaning, has considerable significance in many potential application areas that involve human or machine behavior in an interactive environment.
PHD (Doctor of Philosophy)
inverse reinforcement learning, decision making, Markov decision process, behavior recognition, optimization, machine learning, Gaussian process
English
All rights reserved (no additional license for public reuse)
2012/04/23