Abstract
The rapid expansion of biomedical literature presents a dual challenge: while it holds immense potential for discovery, it has become impossible for researchers to manually synthesize the vast, ever-growing body of information. Computational hypothesis generation (HG) aims to address this by automatically identifying novel, plausible connections between biomedical concepts from large-scale text corpora. However, most existing methods are limited by their reliance on pairwise models, which do not account for beyond pairwise concept interactions, and do not account for spatial, temporal, and semantic dimensions of the evolution of biological research in a unified manner, as reflected in the literature. This thesis proposes two novel methods to address the aforementioned limitations. First, we address the limitation of pairwise models by reframing hypothesis generation as a task of predicting future multi-concept relationships. We introduce HyHG, a temporal hypergraph contrastive learning framework that models each publication as a hyperedge, a coherent set of co-occurring concepts. By learning the evolutionary patterns of these higher-order structures, HyHG can predict which groups of concepts are likely to form novel and plausible associations. Second, we dive deeper to model the fine-grained dynamics of conceptual meaning itself. We propose ConceptDrift, a unified framework that collaboratively models the spatial, temporal, and semantic evolution of biomedical concepts. This approach is motivated by conceptual drift, the principle that a concept's meaning shifts as it co-occurs with new information. By tracking the evolving semantic states of concepts, this model captures a more nuanced understanding of why new relationships emerge. We retrospectively validate both frameworks on large-scale temporal datasets derived from PubMed, demonstrating that they significantly outperform state-of-the-art methods in their respective tasks of future hyperedge and future link prediction. Taken together, this work provides a more powerful and precise set of tools for automated scientific discovery, paving the way for systems that can more effectively navigate the vast biomedical literature to accelerate research.