When Causal Inference Meets Graph Machine Learning: Unleashing the Potential of Mutual Benefit
Ma, Jing, Computer Science - School of Engineering and Applied Science, University of Virginia
Li, Jundong, EN-Elec & Comp Engr Dept, University of Virginia
Zhang, Aidong, EN-Comp Science Dept, University of Virginia
Recent years have witnessed rapid development in graph-based machine learning (ML) in various high-impact domains (e.g., healthcare, recommendation, and security), especially those powered by effective graph neural networks (GNNs). Currently, the mainstream graph ML methods are based on statistical learning, e.g., utilizing the statistical correlations between node features, graph structure, and labels for node classification. However, statistical learning has been widely criticized for only capturing the superficial relations between variables in the data system, and consequently, rendering the lack of trustworthiness in real-world applications. For example, ML models often make biased predictions toward underrepresented groups. Besides, these ML models often lack explanation for humans. Therefore, it is crucial to understand the causality in the data system and the learning process. Causal inference is the discipline that aims to investigate the causality inside a system, for example, to identify and estimate the causal effect of a certain treatment (e.g., wearing a face mask) on an important outcome (e.g., COVID-19 infection). Involving the concepts and philosophy of causal inference into ML methods is often considered as a significant component of human-level intelligence and can serve as the foundation of artificial intelligence (AI). However, most traditional causal inference studies rely on strong assumptions and focus on independent and identically distributed (i.i.d.) data. Thus, most of them cannot be directly grafted on graphs. Therefore, causal inference on graphs is still faced with many unique barriers in effectiveness.
Fortunately, the interplay between causal inference and graph ML has the potential to bring mutual benefit to each other. In this thesis, we will present the challenges and our research contributions for bridging the gap between causal inference and graph ML. Our research aims to unleash the mutual benefit in these two areas, mainly including two key research perspectives: Q1) How to leverage graph ML methods to facilitate causal inference in effectiveness? Q2) How to leverage causality to facilitate graph ML models in model trustworthiness (e.g., model fairness and explanation)? Correspondingly, we introduce the background, challenges, and related work in Part I. In Part II, we introduce our detailed research problems and methodologies for causal inference on graph data powered by graph ML technologies (Q1). In Part III, we present our work in causality-involved trustworthy graph ML methods (Q2). In Part IV, we further introduce future research directions on causal machine learning, trustworthy AI, and graph mining, providing insights that manifest in real-world scenarios to facilitate future high-stakes applications.
PHD (Doctor of Philosophy)
Causal Inference, Graph Learning, Trustworthy AI, Fairness, Explanation, Causal Effect Estimation