Expectation-maximization for Bayes-adaptive POMDPs

Vargo, Erik, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Cogill, Randy, Department of Systems and Information Engineering, University of Virginia

Partially observable Markov decision processes, or POMDPs, are used extensively in modeling the complex interactions between an agent and a dynamic, stochastic environment. When all model parameters are known, near-optimal solutions to the reward maximization problem can be obtained through approximate value iteration. Unfortunately, in many real-world applications a POMDP formulation may not be justified due to uncertainty in
the underlying hidden Markov model parameters. However, if model uncertainty can be characterized by a prior distribution over the state-transition and observation-emission probabilities, it is natural to seek Bayes optimal policies which maximize the expected reward subject to this distribution. The coupling of a POMDP with a model prior was recently formalized as the Bayes-adaptive POMDP (BAPOMDP) and various online and
offline algorithms have since been proposed for this class of problems, the most popular of which are inspired by approximate POMDP value iteration. Despite its success when applied to small benchmark BAPOMDPs, empirical results suggest that value iteration may be inadequate as the degree of model uncertainty increases. As an alternative, in this dissertation we explore expectation-maximization approaches to solving BAPOMDPs, which have the potential to scale more gracefully with both the number of uncertain model parameters and their assumed variability.

PHD (Doctor of Philosophy)
Markov processes, stochastic control, model uncertainty, Bayesian statistics, partially observable Markov decision processes, expectation-maximization, variational Bayes
All rights reserved (no additional license for public reuse)
Issued Date: