Learning and Control in Multi-Agent Systems with Applications on Cyber-Physical Systems

Su, Jianyu, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, EN-Eng Sys and Environment, University of Virginia

Recent years have seen the application of machine learning (ML) to various domains. Coupled with deep neural networks, machine learning methods are cornerstones of adaptive learning and decision-making frameworks. However, Conveniently adopting ML techniques without carefully examining the systems' structures and ML techniques' assumptions often leads to systems' sub-optimal performance. For instance, reinforcement learning (RL), which is a machine learning technique that estimates optimal policies through interactions between an agent and the environment, is designed for single-agent systems. In practice, a centralized RL framework is often utilized to coordinate agents in multi-agent settings. As a result, the action space of a such centralized framework grows exponentially with the number of agents included in the system, which might lead to the system's poor performance. In this dissertation, we present a multi-agent RL framework that is applicable to various cooperative multi-agent systems, as well as proposing new methods to bridge the gaps in the current literature for tasks such as vehicle trajectory prediction and decision making in large-scale manufacturing systems. Our contribution is three-fold:
We propose a graph-based vehicle acceleration framework Traffic Graph Framework (TGF), which captures hierarchical and chains of interactions that might affect the predicted vehicle's future state. The proposed framework utilizes new variants of graph convolution that are specifically adapted for modeling the traffic. Combined with the flexibility of graph data structure, TGF treats the traffic as a multi-agent system and can be employed for various traffic configurations with high prediction quality.
We propose a novel MARL algorithm, value-decomposition multi-agent actor-critic (VDAC). VDAC is an on-policy actor-critic that is compatible with the parallel training paradigm, A2C. As a result, VDAC offers a reasonable trade-off between training efficiency and algorithm performance. In our competitive evaluation, VDAC reports higher win rates than other multi-agent actor-critics on complex multi-agent coordination tasks, StarCraft II micromanagement games.
We propose an adaptive preventive maintenance (PM) scheduling framework based on VDAC for large-scale manufacturing systems. In the simulation study, the proposed framework demonstrates its effectiveness by leading other baselines, including RL-based methods and traditional maintenance models, on a comprehensive set of metrics. Our analysis further demonstrates that our MARL-based method learns effective PM policies without any knowledge about the environment and maintenance strategies.

PHD (Doctor of Philosophy)
Multi-Agent Reinforcement Learning, Graph Learning, Vehicle Acceleration Prediction, Preventive Maintenance
Issued Date: