Policy-Based Reward Shaping for Accelerated and Robust Reinforcement Learning
Wang, Cheng, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, Engineering Systems and Environment, University of Virginia
Reinforcement learning (RL) has recently achieved great successes in areas such as video games, robotics, and the game of Go. However, a number of challenges remain when it comes to applying RL to real-world sequential decision problems. Reward signals from real systems are often sparse, delayed, or noisy, which can significantly slow down the learning process. Compounding this issue with limited data for training, it can become very difficult to learn effective and robust control policies in real-world scenarios. A common approach to speed up learning is through reward shaping, where additional rewards are provided to the RL agent to guide its learning process. Traditional reward shaping methods often require experts to provide a potential function which is an estimate of the value of a state or a state-action pair. In practice, however, potential functions are usually unknown and are difficult to design or learn. On the other hand, for many real-world tasks, there may already exist popular methods that are well studied and understood. In this dissertation, we propose a new reward shaping method that is directly based on existing policies. We provide both theoretical guarantee and empirical evidence that the policies learned under this approach are better or equally good as any given baseline policy. We investigate the potential of RL with the proposed method in two distinct areas of real-world application: pairs trading and flood mitigation. The results demonstrate that our approach can greatly improve the learning process and produce better-performing and more robust policies as compared to using RL without reward shaping.
PHD (Doctor of Philosophy)
Reinforcement Learning, Reward Shaping, Pairs Trading, Flood Mitigation