Application of Time Series Testing and Clustering Framework to Detecting Abnormal Personal Weather Stations
Yang, Bo, Statistics - Graduate School of Arts and Sciences, University of Virginia
Spitzner, Dan, AS-Statistics, University of Virginia
Personal weather stations (PWSs) empower people to monitor the real-time temperature, humidity, wind speed, wind direction and rainfall of any personalized locations. Currently, there are more than 250,000 personal weather stations across the globe, providing rich and hyperlocal weather data. With the availability of higher spatial and temporal resolution of rainfall and temperature measurements, the question comes that should we count on them without any doubt? Even though PWSs are user-friendly and affordable, they are owned by amateur citizens that might be lack of professional knowledge or guidance on installation or maintenance. As a consequence, there are ’untrustworthy’ personal weather stations, and we want to find out which weather station we can trust and which is unreliable. We want to detect abnormal behaviors with the data, especially rainfall and temperature data.
In this thesis, we present comprehensive methods to identify 'untrustworthy' PWSs effectively. Our methods include two perspectives. One approach is to conduct hypothesis testing under time series context, which means we test each PWS data against ground truth.
Weather measurements reported by PWS can be of different statistical properties. For example, daily maximum temperature has a seasonal pattern and can be converted to stationary time series, so we propose the theory for testing periodograms by smoothing, which turns out to outperform all the existing methods. Precipitation measurement, however, has its unique property of zero-inflated distribution and non-stationarity. Tweedie distribution is introduced for the modeling purpose, and we also convert the rainfall data to the truncated stationary process to implement our proposed testing procedure. Based on the practical problem that PWS has multiple weather measurements, we also propose to apply tapered test statistic to the setting of multivariate time series.
Besides, we introduce a clustering-based time series abnormal detection method, which is based on the assumption that we do not have access to the outside data source as the ground truth. We apply the proposed tapered test statistic to hierarchical clustering as a modified dissimilarity measure.
With proposed methods, as PWSs produce real-time climatological data daily, we can generate Bayes Factors with the input data and consistently be used to quantify the 'trustworthiness' of PWS, as well as use clustering results for the situation where surrounding NCDC stations are not available.
PHD (Doctor of Philosophy)
hypothesis testing, time series analysis, time series classification, rates of testing
All rights reserved (no additional license for public reuse)