Accelerating Graph Processing with Near-Memory Accelerator Architectures

Jaiyeoba, Oluwole, Computer Engineering - School of Engineering and Applied Science, University of Virginia
Skadron, Kevin, Computer Science, University of Virginia

For the past fifty years, Moore’s Law and Dennard Scaling have been playing important roles in both performance and energy efficiency of computer systems. Unfortunately, they are not likely to continue, and computers no longer benefit from technology scaling as much as they did in the past. Recently, specialized hardware accelerators have emerged as a promising alternative to general-purpose computing for their potential to achieve orders of magnitude speedup and energy efficiency improvements on data-intensive applications. Graph analytics is an important data-intensive application that has gained prominence over the past years. With the rapid rise in data volumes in important applications such as social media, network analysis, genomics, knowledge graphs, etc, graph analytics allows us extract meaning from vast datasets and understand relationships between entities. However, achieving the full potential of accelerators to run graph analytics remain a challenge since the bottlenecks of such applications do not lie on computation, but data movement. To address this limitation, this thesis presents hardware and software techniques to effectively accelerate both static and dynamic graph processing workloads.

This thesis makes algorithmic (software) and architectural (hardware) innovations to accelerate graph analytics on static and evolving graphs for single- and multi-FPGA settings. First, we propose GraphTinker to tackle the long probe distances incurred when following edges to update dynamic graphs (i.e., edge insertions, deletions and modifications). GraphTinker combines a tree-based and open-address hashing scheme to skip entire regions within an edgelist when updating a graph. Second, we propose ACTS to tackle the poor scaling challenge associated with FPGA graph accelerators. ACTS embodies a set of architectural innovations that optimize off-chip HBM bandwidth usage and increase on-chip UltraRAM (URAM) parallelism in HBM-enabled FPGAs. Both contributions allow significant improvement over prior art. Third, we propose Swift to tackle the challenges from frequent movement of graph data between FPGAs in a multi-FPGA setup. Swift decouples the fundamental primitives in graph processing into a partly-synchronous-partly-asynchronous model, to keep all available bandwidth (PCIe, HBM, on-chip memory) of the FPGA busy. This also results in superior performance over prior art. Fourth, we address the redundancy associated with current models for updating dynamic graphs. We propose a technique that cultivates abundant parallelism and hides graph updating tasks within graph analytics. With a trend of exponentially increasing demand for data-intensive computing, the techniques presented in this thesis will work as useful tools for the acceleration of such important workloads.

PHD (Doctor of Philosophy)
FPGA; High Level Synthesis; Graph Processing; Dynamic Graph Data Structure
Issued Date: