Rethinking Computational Architectures at the Edge Through Asynchronous Temporal Streams
Sreekumar, Rahul, Electrical Engineering - School of Engineering and Applied Science, University of Virginia
Stan, Mircea, EN-Elec & Comp Engr Dept, University of Virginia
The escalating trend towards processing information and abstract features at the edge has compelled researchers to reconsider the methodology for performing these tasks with minimal energy consumption. Edge accelerators confront unique challenges that demand a reconsideration of the fundamental approach to algorithm execution. Notably, the sporadic nature of peak workload demands at the edge calls for a solution involving synchronizing the workload execution rate with the actual occurrence of events. This thesis explores the potential of architectures that compute information and process signals in the temporal domain, introducing the concept of stimulus-driven workload execution by leveraging Asynchronous Stream Computation (ASC) principles to expedite edge Machine Learning workloads. A comprehensive analysis is conducted on the fundamental computational elements required for developing Neural Network (NN) models to evaluate the impact of an ASC-based computational paradigm.
A primary objective of this research is to investigate mechanisms addressing the disparity between conventional processing power and the high memory bandwidth required for storing high-dimensional data vectors. This goal is realized through the development of Compute-in-Memory (CiM) architectures, utilizing asynchronous streams to regulate the execution rate of data-intensive operations like Vector-Matrix Multiplications (VMMs), Multiply-and-Accumulate (MAC), and dot product, among others. To gauge the impact of these stream-based CiM tiles in practical Machine Learning scenarios, the thesis implements an end-to-end streaming Convolutional Neural Network image classifier model aligned with Asynchronous Stream Computation (ASC) principles. The implemented classifier architecture exhibits a scalable performance between 28 - 249 Frames/second (FPS) while maintaining an energy efficiency of 81.54 and 247.08 TOPS/W. Furthermore, the research encompasses the meticulous design of essential computational elements to facilitate and optimize this implementation.
The thesis also delves into reimagining the neural encoding scheme within Spiking Neural Networks (SNNs) using the proposed Sigma-Delta-Sigma encoding. This encoding technique enables dual-purpose neural connections, facilitating feature extraction while suppressing noise-like features from propagating through the network. These neurons possess unique noise-filtering characteristics, making them suitable for integration into models capable of robustly handling input feature-dependent noise and random temporal perturbations within the physical signal. We implemented reservoir computing networks in the spiking domain and demonstrated their effectiveness by implementing liquid-state machine models for time-series predictive engines. We also showed the robustness of the network against low-quality audio.
PHD (Doctor of Philosophy)
English
2024/12/04