Informative structures in complex networks
Miao, Ruizhong, Statistics - Graduate School of Arts and Sciences, University of Virginia
Li, Tianxi, Statistics, University of Virginia
Rodu, Jordan, Statistics, University of Virginia
Networks or graphs represent the relationships or interactions among entities and provide valuable information about the underlying data generating systems. Network data can be observed alone or accompanied by other forms of data, and in both cases, network data can be effectively leveraged to learn the underlying structures of the data. In this thesis, we consider both cases. First, when only the network data alone are observed, an unanswered question in statistical network analysis is how researchers should identify the informative component of the network data and filter out the noises. We address this problem in Chapter 2. Second, we consider the problem of integrating network data with other data modalities in the context of topic modeling in Chapter 3.
In statistical network analysis, an important task is using statistical models to describe the underlying structures. However, in practice, the structure of modeling interest is usually hidden in a larger network in which most structures are not informative. The noise and bias introduced by the non-informative component in networks can obscure the salient structure and limit many network modeling procedures’ effectiveness. In Chapter 2, we introduce a novel core-periphery model for the non-informative periphery structure of networks without imposing a specific form for the informative core structure. Based on the model, we propose spectral algorithms for core identification as a data preprocessing step for general downstream network analysis tasks. The algorithm enjoys a strong theoretical guarantee of accuracy and is scalable for large networks. We evaluate the proposed method by extensive simulation studies demonstrating various advantages over many traditional core-periphery methods. The method is applied to extract the informative core structure from a citation network and give more informative results in the downstream hierarchical community detection.
Next, in Chapter 3, we consider the problem of incorporating network data into topic models. We develop a topic model that incorporates document-level features and citation networks. To the best of our knowledge, compared with existing topic models that also incorporate the document-level features, our model takes into ac- count two different types of causal relations between the document-level features and the topic distributions. In addition, no existing topic models were able to incorporate both network data and document-level features. We compare our proposed model to existing topic models on the same data set in terms of several automated topic model evaluation metrics. We showed that our proposed model could simultaneously achieve high held-out likelihood, coherence, and stability. Specifically, the inclusion of network data offers an improvement in topic stability.
PHD (Doctor of Philosophy)
Statistical network analysis, Topic model, Statistical learning