Information Extraction and Fusion for Improving Health Safety

Preum, Sarah Masud, Computer Science - School of Engineering and Applied Science, University of Virginia
Stankovic, John, EN-Comp Science Dept, University of Virginia

With the rapid digitization of healthcare, there are a plethora of health applications for health management, monitoring, and decision support that are being used by both healthcare recipients and healthcare providers. These applications generate and collect multimodal data. Several research directions aim to increase the efficiency, capability, and accuracy of such digital health applications. However, there is limited research that focuses on the aspect of health safety. For instance, in the context of multiple smart health applications, one app might suggest an intervention that conflicts with an intervention that originates from another app. Adhering to conflicting information can result in long term and short term adverse health effects, ineffective treatments, and a rise in healthcare costs. Detecting potential violations of safety in smart health applications is challenging as it requires (i) deep semantic inference and (ii) combining heterogeneous data streams that vary in terms of modality, format, and content. These tasks are particularly challenging as this domain is low-resource due to scarce training data and an expensive annotation process.

This dissertation attempts to bridge this knowledge gap by developing information extraction and fusion solutions to infer and combine the semantics of multimodal data originated from digital health applications under resource and domain constraints. Its goal is to improve the health safety of individuals and provide decision support to users. Specifically, it presents PreCluDe that formulates and solves the problem of automatic conflict detection and categorization from textual advice generated from health websites, drug usage guidelines, and smart health apps. PreCluDe combines linguistic features of textual health and medical advice with multiple external knowledge bases to generate semantic rules and detect conflicts in an interpretable, personalized, and context-aware manner. Based on our extensive evaluation of multiple real advice datasets on eight general health topics and thirty-four chronic diseases, PreCluDe results in an overall 0.88 recall outperforming several machine learning and deep learning based text classifiers in detecting five types of conflicts in health advice. Also, PreCluDe addresses personalization and context-awareness in conflict detection from health and medical advice statements and activities of daily living. Next, ActSafe is developed to detect and predict multimodal conflicts in for medication adherence for chronic disease management. Potential violations of medical temporal constraints (MTCs) specified by drug usage guidelines are defined as multimodal conflict as they occur between textual medical advice and the time series of activities of daily living of an individual. ActSafe relies on a novel taxonomy of potential MTCs and a context-free grammar based model to normalize MTCs from unstructured free-format descriptions. Another critical challenge is predicting relevant short-span activities that are central to predicting violations of MTCs with a robust prediction horizon. We develop an activity prediction model using multi-step, multi-variate stacked long short term memory (MM-LSTM) network that addresses this challenge. ActSafe can predict violations of six types of MTCs within a week's prediction horizon with an F1-score as high as 0.9.

In addition to improving the personalized health safety of patient-centric applications, this thesis also develops an information extraction and fusion solution, namely EMSContExt, to assist emergency care providers or first responders. EMSContExt lies in the center of a cognitive assistant for emergency response and extracts EMS protocol-specific concepts in real-time from the spoken language at the scene (e.g., transcriptions of conversations among responders, patients, and by-standers recorded by responder-worn microphones). EMSContExt achieves 0.84 recall and 0.83 F1-score on average for protocol-specific concept extraction and outperforms the state-of-art supervised medical concept annotation tool, MetaMap, with a three times increase in F1-score and 22\% increase in recall on average. We also evaluate the applicability of EMSContExt for recommending EMS protocol-specific interventions by the cognitive assistant. EMSContExt achieves a 4\% increase, 6\% increase, and six times speedup in weighted F1-score, weighted recall, and execution time, respectively, for recommending EMS interventions compared to MetaMap.

This thesis can improve health safety in the imperative scenarios of personalized health applications, chronic disease management, and emergency response. The techniques developed in this thesis and the findings from this thesis can also enhance personalized recommendation systems, intelligent assistants, and decision support tools for healthcare.

PHD (Doctor of Philosophy)
Conflict Detection, Information Extraction, Semantic Inference, Low Resource NLP, Temporal Modeling of Human Behavior, Intelligent Assistants for Healthcare, Smart Health, Knowledge Integration
All rights reserved (no additional license for public reuse)
Issued Date: