Online Archive of University of Virginia Scholarship
Policy Optimization in Robust Markov Decision Processes with Transition Gradient Theorem371 views
Author
Luo, Licheng, Computer Science - School of Engineering and Applied Science, University of Virginia0009-0003-8662-7671
Advisors
Zhang, Shangtong, EN-Comp Science Dept, University of Virginia
Abstract
Reinforcement Learning (RL) is a powerful framework for sequential decision making. However, standard RL methods often struggle when the environment dynamics are uncertain, leading to poor performance in real-world applications such as autonomous navigation, financial portfolio management, and robotic control. This limitation is a significant factor contributing to the lack of widespread adoption of RL-based control systems in industry.
To address this challenge, researchers introduced the robust Markov Decision Process (MDP), a sequential decision-making framework that explicitly models uncertainty in transition functions. Robust MDP aims to find a policy that consistently performs well across a range of possible transition functions. It has great potential for application in various domains, where the environment dynamics are uncertain or changing.Solving a robust MDP requires finding a policy that consistently performs well across a set of transition functions.
In this thesis, we model a robust MDP as a two-player game. The first player represents the policy, trained via standard policy optimization methods. The second player is an adversary that selects transition functions aimed at deteriorating the performance of the policy. A key contribution of this work is the transition gradient theorem, which enables effective training of the adversary by providing a structured way to optimize the transition functions. The two players are updated in an alternating fashion.
We validate the proposed approach in simple environments to demonstrate robustness and then scale up to complex robotic manipulation tasks. Our findings showcase the scalability and efficacy of robust MDP methods in handling real-world uncertainties, highlighting their potential for practical applications.
Degree
MS (Master of Science)
Keywords
Reinforcement Learning; Robustness; State Perturbation
Language
English
Rights
All rights reserved (no additional license for public reuse)
Luo, Licheng. Policy Optimization in Robust Markov Decision Processes with Transition Gradient Theorem. University of Virginia, Computer Science - School of Engineering and Applied Science, MS (Master of Science), 2024-12-09, https://doi.org/10.18130/1xd3-zs45.