Abstract
The implementation of data science methods in the practice of nursing is a rapidly evolving catalyst for clinical decision-making support and care standardization ultimately fostering an environment of data-driven nursing care and communication. With a rise in machine learning research and improved accessibility of large time series data from inpatient hospital records, there has been recent interest in clinical phenotyping using time series clustering algorithms to capture dynamic heterogeneity that is often lost in cross-sectional analyses. Phenotypes derived from dynamic features (e.g. multiple time points) rather than static features (e.g. single time point) have been shown to better predict patient outcomes such as illness severity, treatment response, and mortality. Modeling fluctuations in illness severity over time allows clinicians to make meaning from time series data collected routinely throughout a patient’s hospitalization. Methods for clustering longitudinal trajectories to assess patient subpopulations require researchers to consider many factors that ultimately impact identified clinical phenotypes (i.e. data representation, distance measures, and clustering algorithms). This research contributes to current knowledge by (a) increasing our understanding of dynamic illness severity states during a patient’s hospitalization through the (b) examination of how different clustering approaches produce similar or distinct patient trajectories. The aims of this dissertation research are to explore various preprocessing and modeling approaches for analyzing inpatient times series data and evaluate the impact of methodological decisions made during model construction on illness severity trajectory identification using an existing dataset of acute care cardiac patients from the control-arm (N=5,184) of a larger clinical trial which collected illness severity states every 15 minutes (n=2,174,117). Aim 1 – Understanding time series clustering methods of inpatient data. The purpose of Aim 1 is to understand the clinical relevance of time series clustering analysis in the inpatient setting and the methodological decisions involved in applying time series clustering to inpatient data. This first paper describes how methodological decisions of time series clustering vary widely and emphasized that many articles do not follow clear guidelines for conducting and transparently reporting the data. The studies reviewed demonstrate a diverse range of clinical relevance, attesting to the clinical utility and widespread application of these techniques. Methodological decisions can impact the resulting clinical phenotypes identified through these techniques, which must be fully explored before integrating this type of technology into clinical decision-making. Further investigation into the impact of methodological decisions and the clinical relevance of time-series clustering on inpatient data is essential to support clinicians in leveraging routinely collected data for clinical decision support. Aim 2 – Understanding illness severity trajectories of hospitalized patients through time series clustering. The purpose of Aim 2 is to explore time series clustering methods for identifying novel clinical phenotypes of patients in acute care cardiac populations. In this second paper, we show that illness severity trajectory clusters differed across methodological approaches (k-means, k-means+DTW, and kshape) and have statistically distinct clinical phenotypes in terms of demographics, admitting diagnoses, and clinical outcomes. K-shape proved to be the most clinically relevant clustering method for predicting emergent intubation and cardiac arrest later in hospitalization. This suggests that (1) k-shape identifies clusters that are more clinically relevant than those derived using other distance measures, and (2) illness trajectory shape provides additional predictive value beyond the mean illness severity for certain clinical outcomes. Future studies should more carefully compare methodological choices in time series clustering, as inappropriate analytic decisions may lead to loss of clinically meaningful information and reduced robustness of conclusions. Aim 3 – Understanding the impact of including demographic features within time series clustering of illness severity scores. The purpose of Aim 3 is to investigate component of risk calculation that may be modeled over time or used in time series clustering for introduction of representation and measurement bias. This third paper demonstrates how racism, sexism, and ageism are represented in the data used to train clinical algorithms used as decision-support tools. Because of this, algorithm developers, including nurse scientists, must ensure transparency in their decision-making processes, clearly articulating the rationale for variable inclusion to avoid unintentional harm caused by ignorance and complacency. By acknowledging and addressing these harms, clinicians and developers can mitigate them and uphold the ethical principles of justice, and non-maleficence in their medical and nursing practice. Lastly, more studies must be done to explore the impact of structural discrimination on observed differences between various demographic groupings to mitigate against this attribution error. Time series clustering of inpatient physiological markers has the potential to identify meaningful groupings of hospitalized patients, capturing how they respond to treatment, progress in their condition, or experience care delays that ultimately affect their outcomes. Rigorously and ethically evaluating the utility of these methods contributes to the broader effort of leveraging routinely collected hospital data for clinical decision-making. The ultimate goal of this dissertation is to use data to foster earlier intervention and better communication between clinicians-clinicians and clinicians-families about patient status.