Policy Optimization in Robust Markov Decision Processes with Transition Gradient Theorem

Luo, Licheng

Policy Optimization in Robust Markov Decision Processes with Transition Gradient Theorem 443 views

Author

Luo, Licheng, Computer Science - School of Engineering and Applied Science, University of Virginia 0009-0003-8662-7671

Advisors

Zhang, Shangtong , EN-Comp Science Dept , University of Virginia

Abstract

Reinforcement Learning (RL) is a powerful framework for sequential decision making. However, standard RL methods often struggle when the environment dynamics are uncertain, leading to poor performance in real-world applications such as autonomous navigation, financial portfolio management, and robotic control. This limitation is a significant factor contributing to the lack of widespread adoption of RL-based control systems in industry.

To address this challenge, researchers introduced the robust Markov Decision Process (MDP), a sequential decision-making framework that explicitly models uncertainty in transition functions. Robust MDP aims to find a policy that consistently performs well across a range of possible transition functions. It has great potential for application in various domains, where the environment dynamics are uncertain or changing.Solving a robust MDP requires finding a policy that consistently performs well across a set of transition functions.

In this thesis, we model a robust MDP as a two-player game. The first player represents the policy, trained via standard policy optimization methods. The second player is an adversary that selects transition functions aimed at deteriorating the performance of the policy. A key contribution of this work is the transition gradient theorem, which enables effective training of the adversary by providing a structured way to optimize the transition functions. The two players are updated in an alternating fashion.

We validate the proposed approach in simple environments to demonstrate robustness and then scale up to complex robotic manipulation tasks. Our findings showcase the scalability and efficacy of robust MDP methods in handling real-world uncertainties, highlighting their potential for practical applications.

Degree

MS (Master of Science)

Keywords

Reinforcement Learning; Robustness; State Perturbation

Language

English

Rights

Issued Date

2024-12-09

Suggested Citation

Luo, Licheng. Policy Optimization in Robust Markov Decision Processes with Transition Gradient Theorem. University of Virginia, Computer Science - School of Engineering and Applied Science, MS (Master of Science), 2024-12-09, https://doi.org/10.18130/1xd3-zs45.