Deep Reinforcement Learning on Optimal Trade Execution Problems

Lin, Siyu, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, EN-Eng Sys and Environment, University of Virginia

Algorithmic trading involves the use of computer programs, algorithms and advanced mathematical models to make trading decisions and transactions in the financial market. It has become prevalent in major financial markets since late 1990s. Optimal trading execution is one of the most crucial problems in the realm of algorithmic trading, as the profitability of many trading strategies depends on the effectiveness of the trade execution. Optimal trade execution problem concerns about how to best trade a set of stock shares at minimal cost. As the trading volume becomes high, the costs associated with executing the trade usually become significantly large which will weaken the profitability of the trading strategy. The goal of the proposed research is to develop a deep reinforcement learning (DRL) based methodology for solving the optimal trade execution problems in algorithmic trading.

In this thesis, we propose a deep reinforcement learning based framework to learn to minimize trade execution costs by splitting a sell order into child orders and execute them sequentially over a fixed period. The framework is based on a variant of the Deep Q-Network (DQN) algorithm that integrates the Double DQN, Dueling Network, and Noisy Nets. In contrast to previous research work, which uses implementation shortfall as the immediate rewards, we use a shaped reward structure, and we also incorporate the zero-ending inventory constraint into the DQN algorithm by slightly modifying the Q-function updates relative to standard Q-learning at the final step.

We demonstrate that the DQN based optimal trade execution framework (1) converges fast during the training phase, (2) outperforms TWAP, VWAP, AC and 2 DQN algorithms during the backtesting on 14 US equities, and also (3) improves the stability by incorporating the zero ending inventory constraint.

Additionally, we propose an end-to-end adaptive framework for optimal trade execution based on Proximal Policy Optimization (PPO). We use two methods to account for the time dependencies in the market data based on two different neural network architecture: 1) Long short-term memory (LSTM) networks, 2) Fully-connected networks (FCN) by stacking the most recent limit orderbook (LOB) information as model inputs. The proposed framework can make trade execution decisions based on level-2 limit order book (LOB) information such as bid/ask prices and volumes directly without manually designed attributes as in previous research. Furthermore, we use a sparse reward function, which gives the agent reward signals at the end of each episode as an indicator of its relative performances against the baseline model, rather than implementation shortfall (IS) or a shaped reward function.

The experimental results have demonstrated advantages over IS and the shaped reward function in terms of performance and simplicity. The proposed framework has outperformed the industry commonly used baseline models such as TWAP, VWAP, and AC as well as several Deep Reinforcement Learning (DRL) models on most of the 14 US equities in our experiments.

Finally, we have explored the robustness of the proposed DRL algorithms by three approaches. First, we apply the trained policies on FB historical LOB data to the other stocks directly and demonstrate that the learned policies are robust and are able to adapt to stock data, which they have never been trained on. To further investigate the robustness of the learned policies, we build DRL agents based on the learned policies and have the DRL agents play in the ABIDES simulation environment, in which the trading agents can interact with each other. The results also suggest that the DRL agents still perform well in a simulation environment, even if they are trained on historical LOB data. Finally, we have trained and tested the proposed DRL algorithms on additional stock data and in different periods. The proposed DRL algorithms still perform well, which justifies that the proposed DRL algorithms perform well not because they are trained and tested on specific stock data or in specific periods.

PHD (Doctor of Philosophy)
deep reinforcement learning, algorithmic trading, optimal trade execution, limit orderbook, neural network
All rights reserved (no additional license for public reuse)
Issued Date: