Safe Sequential Decision Making in Uncertain Environments

Author: ORCID icon
ElSayed-Aly, Ingy, Computer Science - School of Engineering and Applied Science, University of Virginia
Feng, Lu, EN-Comp Science Dept, University of Virginia

Sequential decision making is a collection of techniques automating decisions for a given system model. In general, exact models are unrealistic since real-world behavior may be influenced by external factors and modeling errors. We focus on methods which model uncertain environments featuring both stochasticity and non-determinism. This dissertation develops safe sequential decision making for different types of uncertain environments.
We concentrate our efforts on two types of decision making: multi-agent reinforcement learning (MARL) where agents seek to learn optimal policies; and probabilistic model checking where the model's transitions and states are known. To address safe sequential decision making, we develop a suite of novel techniques to bring logic guided learning to MARL algorithms and informative distributional algorithms to probabilistic model checking.

In the first chapter, we introduce two novel shielding approaches for safe MARL synthesized based on a temporal logic safety specification. For our centralized approach, we synthesize a single shield to monitor all agents' joint actions and correct any unsafe actions. We also propose a factored shield where we synthesize multiple shields based on a factorization of the joint state space; the set of shields monitors agents concurrently. The factored approach has the advantage of being more scalable while the centralized shield is closer to optimal.

The shielding methods require general information about the underlying model which is not always available. Therefore, in the second chapter, we propose a novel framework for designing reward functions for agents to learn the desired behaviors while avoiding unsafe situations. We use temporal logic to define which behaviors to encourage or shun. We present a semi-centralized logic-based learning algorithm for reward shaping that is scalable in the number of agents.

Next, in the third chapter, we extend safe sequential decision making to probabilistic model checking. Specifically, we design a distributional extension to probabilistic model checking. We reason about a variety of distributional measures and propose a method to precisely compute the full distribution. We also approximate the optimal policy using distributional value iteration for varying levels of risk-sensitivity.

Finally, in the fourth chapter, we extend this work and formulate an algorithm to optimize the weighted expected value of accumulated rewards in uncertain parametric models. We leverage the joint distribution over the uncertain parameters using tractable yet expressive distributional representations to provide less conservative policies.

We evaluate the approaches implemented in this dissertation across a diverse range of relevant case studies. Experiments demonstrate significant advances in safe yet scalable methods for multi-agent reinforcement learning and informative safety policy analysis for probabilistic model checking. Moreover, to facilitate collaboration and future innovation, all frameworks developed are made publicly available.

PHD (Doctor of Philosophy)
Multi-agent Reinforcement Learning, Safe Learning, Probabilistic Model Checking, Distributional Value Iteration
Issued Date: