Online Archive of University of Virginia Scholarship
Enhancing Communication Efficiency in Distributed Training and Inference of Graph-Based and Mixture-of-Experts-Based Machine Learning Architectures38 views
Author
Mahmud, Shohaib, Computer Engineering - School of Engineering and Applied Science, University of Virginia0009-0000-5989-078X
Advisors
Shen, Haiying, University of Virginia
Abstract
Training Graph Neural Networks (GNNs) and Dynamic Graph Neural Networks (DGNNs) on large-scale
real-world graphs often require distributed computation, but are challenged by communication-intensive data access patterns. Existing partitioning and caching strategies fail to capture these patterns accurately, leading to excessive communication overhead and suboptimal utilization of computational resources. In parallel, Mixture-of-Experts (MoE) models—now prevalent in large language models—face significant host-to-device communication bottlenecks in resource-constrained settings, where only a subset of experts can reside in device memory and others must be dynamically reloaded. State-of-the-art resource-constrained MoE systems partially mitigate this issue by predicting and preloading the next layer’s experts, but their performance deteriorates under heavy workloads.
This dissertation introduces three systems that address these challenges. Pacer improves GNN training
by jointly modeling graph topology and sampling behavior to estimate data access frequency, enabling more effective partitioning and caching. It further hides communication latency through CPU-side pipelining. DGT targets DGNN training with a window-based partitioning scheme that captures both temporal and static structures, enabling communication-aware pipelining for full-batch training. Finally, LiteMoE accelerates MoE inference by leveraging the observation that some tasks tolerate imprecise expert routing. It selectively loads experts to reduce host-to-device data transfers and predicts expert usage across multiple future layers. Additionally, it exploits generation-length profiles to reorder input batches and minimize communication overhead. Together, these systems substantially improve communication efficiency and system throughput in
distributed training and inference of modern graph and language models.
Degree
PHD (Doctor of Philosophy)
Language
English
Rights
All rights reserved (no additional license for public reuse)
Mahmud, Shohaib. Enhancing Communication Efficiency in Distributed Training and Inference of Graph-Based and Mixture-of-Experts-Based Machine Learning Architectures. University of Virginia, Computer Engineering - School of Engineering and Applied Science, PHD (Doctor of Philosophy), 2025-07-29, https://doi.org/10.18130/s7v6-qv65.