Feature-Based Spatio-Temporal Modeling
Wang, Xiaofeng, Department of Systems Engineering, University of Virginia
Brown, Donald, Department of Systems and Information Engineering, University of Virginia
Dimensions of data are expanding. An increasing number of spatio-temporal data are available with numerous features, including ordinary numerical and categorical features as well as unstructured features like text. Although those high dimensional data can help improve predictions, efficient methods of processing spatio-temporal data with many different types of features are limited.
This dissertation formalized an important class of problems related to spatio-temporal data. In the dissertation, an effective mathematical model, the local spatio-temporal generalized additive model(LSTGAM), was developed to predict and classify spatio-temporal data. This model can fully utilize many different types of data, such as spatial and temporal data, geographic data, demographic data, textual data, etc. The model can be easily estimated by available algorithms and has good interpretability. To assist the building of LSTGAM, a randomized least angle regression (RLAR) method was used to select features for non-linear regression models. Tests with simulated data and real data showed RLAR performed well. In addition, a new method, the semantic role labeling-based latent Dirichlet allocation (SRL-LDA) model, was developed to extract key information from text. This method is based on the automatic semantic analysis and understanding of natural language, combined with dimensionality reduction via latent Dirichlet allocation. The above two models, LSTGAM and SRL-LDA, can be applied together to applications where unstructured textual data contains indicators relevant to the spatio-temporal properties of events.
The newly developed models have been applied to four real problems, including predictions of criminal incidents and analysis of train accidents. Results showed the LSTGAM outperformed several previous models, such as spatial generalized linear models and hot spot models, in evaluations with the spatio-temporal classification problem. It also showed that SRL-LDA can effectively extract useful information from unstructured textual data like Twitter posts. Information extracted by SRL-LDA showed the ability to improve the prediction performance in different cases. Those applications also revealed interesting sources of data for criminal prediction: social media services like Twitter. As discussed at the end of the dissertation, a large scale text analysis system with modeling techniques developed in this dissertation can provide solutions for many areas where predictions are important.
PHD (Doctor of Philosophy)
spatio-temporal modeling, text mining, predictive model, feature selection, generalized additive models, crime prediction
All rights reserved (no additional license for public reuse)