Abstract
In the era of big data, diverse health data from multiple sources have become increasingly prevalent in healthcare. These data include clinical records, genomic sequences, imaging and text data, and wearable device outputs, presenting substantial challenges and opportunities for data-driven decision-making. The COVID-19 pandemic and the emergence of Long COVID have introduced significant challenges in understanding disease progression, healthcare disparities, and long-term health outcomes. This dissertation applies Bayesian risk analysis to large-scale COVID-19-related health data, leveraging probabilistic frameworks to quantify uncertainties and improve decision-making in healthcare. A key aspect of this research is the use of electronic health record (EHR) data, particularly from the National COVID Cohort Collaborative (N3C) and the integrated Translational Health Research Institute of Virginia (iTHRIV), to characterize the clinical and epidemiological patterns of COVID-19 and Long COVID. By analyzing extensive patient data, this study aims to identify risk factors, disease trajectories, and treatment responses, contributing to a more comprehensive understanding of these conditions. Additionally, this work seeks to promote healthcare equity by identifying disparities in diagnosis, treatment, and access to care among underrepresented populations, ensuring that vulnerable groups receive adequate medical attention. By integrating Bayesian methods with diverse data sources, including EHR data, patient-reported outcomes, and telehealth utilization patterns, this dissertation provides novel insights into the long-term health impacts of COVID-19 and offers evidence-based recommendations to support equitable and effective healthcare solutions in a post-pandemic world.
Bayesian risk analysis is a probabilistic framework that allows for the quantification of uncertainty in decision-making by incorporating prior knowledge with observed data. Unlike frequentist approaches, which rely solely on sample data, Bayesian methods update prior beliefs as new evidence emerges, making them particularly useful in complex and dynamic healthcare settings. In the context of electronic health record (EHR) data, Bayesian models can be employed to assess patient risk factors, predict disease progression, and evaluate treatment effectiveness by integrating diverse data sources such as demographics, clinical notes, laboratory results, medical history, and social determinants of health. These methods can handle missing or noisy data more effectively by leveraging hierarchical modeling and probabilistic inference. For COVID-19 and Long COVID research, Bayesian models can be used to identify high-risk populations, estimate the likelihood of developing long-term complications, uncover survival trends, and optimize healthcare interventions. By applying Bayesian risk analysis to large COVID-19-focused EHR datasets, such as those from the National COVID Cohort Collaborative (N3C), researchers can improve clinical decision-making, support resource allocation, and inform public health policies with greater precision and adaptability.