A Query-Driven Interface for Exploring Biomedical Knowledge and Social Determinants of Health; Designing Ethical Biomedical Knowledge Graphs: Addressing Bias, Privacy, and Equity in‬ ‭AI-Driven Healthcare‬ ‭

Author:
Bresnick, Jaren, School of Engineering and Applied Science, University of Virginia
Advisors:
Zhang, Aidong, Computer Science, University of Virginia
Francisco, Pedro, Department of Engineering and Society, University of Virginia
Abstract:

What happens when powerful medical tools unintentionally reinforce the very disparities they aim to solve? My Capstone Project focuses on building a biomedical knowledge graph that integrates clinical and social determinants of health (SDoH) data, paired with a user-friendly interface that translates natural language queries into Cypher to generate interpretable visualizations. I undertook this research to improve equitable access to healthcare insights and make complex medical data more accessible for both clinicians and researchers. My STS paper explores the ethical implications of biomedical knowledge graphs, particularly how they can either reinforce or mitigate bias, privacy concerns, and unequal access to care. This project arose from the need to critically evaluate the sociotechnical systems behind AI-driven healthcare tools and propose frameworks for ethical implementation. Together, the Capstone and STS research offer both a technical solution and a critical reflection, working in tandem to create not only smarter but also fairer healthcare technologies.

The goal of my Capstone is to build a biomedical knowledge graph that not only integrates structured medical data but also includes SDoH, factors like income, education, and housing stability, to provide a more complete view of patient health. A core feature of the project is a tool that enables users to input natural language queries, which are then parsed and translated into Cypher, the query language for graph databases like Neo4j. This allows users to explore connections between, for example, chronic disease, medication, and environmental exposure, without requiring programming knowledge. The design addresses both data integration challenges and usability gaps in current medical graph systems, offering a more holistic and accessible platform for healthcare analytics. Preliminary results show that the integration of SDoH into the knowledge graph reveals complex, non-obvious relationships, such as links between asthma incidence and geographic indicators of housing quality. The natural language interface, powered by a transformer-based NLP model fine-tuned for biomedical contexts, successfully converts user queries into Cypher with high accuracy. Visualizations generated through the interface provide interpretable graphs that make hidden patterns more visible to clinicians and public health professionals. The project concludes that combining intuitive user interfaces with ethically aware data modeling can enhance the practical and equitable application of biomedical knowledge graphs in real-world healthcare environments.

My STS paper asks: How can biomedical knowledge graphs be designed to support ethical, accurate, and equitable medical treatment while mitigating risks related to privacy, bias, and access? This question is essential because the same tools that promise more personalized care can also amplify disparities if built on biased or incomplete data. I use a qualitative methodology, drawing from Science, Technology, and Society (STS) frameworks, case study analysis, and literature review. The research focuses on three central themes: bias in training data, privacy vulnerabilities, and equitable representation of populations, especially marginalized groups, in medical AI systems. Using evidence from case studies like Hetionet and scholarly analyses of bias, privacy, and interpretability in healthcare AI, the paper finds that biomedical knowledge graphs can exacerbate or help correct structural inequities, depending on how they are designed and governed. Studies such as Obermeyer et al. (2019) reveal how biased training data can distort diagnosis and treatment, while research by Price and Cohen (2019) shows that de-identified medical records can still be re-identified, compromising patient privacy. The paper concludes that ethical design must incorporate explainable AI, differential privacy, and diverse data sources, along with adherence to frameworks like FAIR, GDPR, and HIPAA. Most importantly, it calls for active stakeholder participation, particularly from patients and clinicians, to ensure that these systems are not only technically powerful, but socially just.

Degree:
BS (Bachelor of Science)
Keywords:
Knowledge Graph, Social Determinants of Health, Query-Driven
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Aidong Zhang

STS Advisor: Pedro Francisco

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2025/05/10