Enhancing Communication Efficiency in Distributed Training and Inference of Graph-Based and Mixture-of-Experts-Based Machine Learning Architectures

Mahmud, Shohaib

Enhancing Communication Efficiency in Distributed Training and Inference of Graph-Based and Mixture-of-Experts-Based Machine Learning Architectures 70 views

Author

Mahmud, Shohaib, Computer Engineering - School of Engineering and Applied Science, University of Virginia 0009-0000-5989-078X

Advisors

Shen, Haiying , University of Virginia

Abstract

Training Graph Neural Networks (GNNs) and Dynamic Graph Neural Networks (DGNNs) on large-scale
real-world graphs often require distributed computation, but are challenged by communication-intensive data access patterns. Existing partitioning and caching strategies fail to capture these patterns accurately, leading to excessive communication overhead and suboptimal utilization of computational resources. In parallel, Mixture-of-Experts (MoE) models—now prevalent in large language models—face significant host-to-device communication bottlenecks in resource-constrained settings, where only a subset of experts can reside in device memory and others must be dynamically reloaded. State-of-the-art resource-constrained MoE systems partially mitigate this issue by predicting and preloading the next layer’s experts, but their performance deteriorates under heavy workloads.
This dissertation introduces three systems that address these challenges. Pacer improves GNN training
by jointly modeling graph topology and sampling behavior to estimate data access frequency, enabling more effective partitioning and caching. It further hides communication latency through CPU-side pipelining. DGT targets DGNN training with a window-based partitioning scheme that captures both temporal and static structures, enabling communication-aware pipelining for full-batch training. Finally, LiteMoE accelerates MoE inference by leveraging the observation that some tasks tolerate imprecise expert routing. It selectively loads experts to reduce host-to-device data transfers and predicts expert usage across multiple future layers. Additionally, it exploits generation-length profiles to reorder input batches and minimize communication overhead. Together, these systems substantially improve communication efficiency and system throughput in
distributed training and inference of modern graph and language models.

Degree

PHD (Doctor of Philosophy)

Language

English

Rights

Issued Date

2025-07-29

Suggested Citation

Mahmud, Shohaib. Enhancing Communication Efficiency in Distributed Training and Inference of Graph-Based and Mixture-of-Experts-Based Machine Learning Architectures. University of Virginia, Computer Engineering - School of Engineering and Applied Science, PHD (Doctor of Philosophy), 2025-07-29, https://doi.org/10.18130/s7v6-qv65.