Expert Driven and Automated Time Series Data Augmentation with Physiological Applications
Jablonski, James, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Brown, Don, DS-Data Science School, University of Virginia
Most deep learning research in the past decade has focused on developing new model architectures, while improvements associated with the data used to train these models have stagnated. Although there seems to be an abundance of available data for tasks such as image recognition, recent research demonstrates that data shortcomings still inhibit model performance and limit the effectiveness of architectural improvements. One way to address this problem and improve model generalization is through data augmentation. This process generates additional synthetic training samples through transformations of the original data.
Data augmentation is not well explored for Time Series Classification (TSC), which is a challenging and important task in machine learning with wide applicability across multiple domains. In fact, any classification problem, when using data that has some notion of ordering, can be cast as a TSC problem. In this dissertation, we advance research in time series augmentation policy design for deep learning in two areas. First, we develop an automated approach to time series data augmentation policy design using population based training (PBA-T), which we evaluate against the University of California Riverside time series classification archive. On this key benchmark, our method trains models that significantly out-perform all existing time series classification techniques. Second, we demonstrate the incorporation of expert knowledge into augmentation policy design for three different physiological time series classification problems and introduce two novel time series augmentation based on expert input.
These advances will enable the development of deep learning-based time-series models in unexplored areas. With PBA-T, researchers will be able to experiment with deeper and more complex TSC models on data from multiple domains. With our GradMix method, models can be successfully trained on critical problems in healthcare that often involve severe data imbalance. Finally, the nerve jitter augmentation will enable better modeling of microneurography data that can enhance our understanding of the senses and nervous system.
PHD (Doctor of Philosophy)
time series, deep learning, data augmentation, population based