Wisdom of the crowd: mapping continuous cell states with hyperparameter-randomized ensemble clustering
Goggin, Sarah, Neuroscience - School of Medicine, University of Virginia
Zunder, Eli, EN-Biomed Engr Dept, University of Virginia
Recent technological advances have enabled high-throughput single cell molecular profiling, generating unprecedented volumes of data that promise to revolutionize our understanding of cellular heterogeneity and its role in health and disease. However, our ability to extract meaningful biological insights from these complex datasets has not kept pace with our capacity to generate them. A fundamental challenge lies in the computational methods used to characterize cell populations, which often lack scalability, generalizability, and the ability to capture the nuanced biological reality of both discrete and continuous cellular variation.
This thesis addresses these challenges through the development of novel computational approaches for analyzing and organizing single cell data. Central to this work is ESCHR, an innovative ensemble clustering method that eliminates the need for manual parameter tuning while providing superior accuracy and robustness compared to existing approaches. ESCHR's unique hyperparameter-randomized ensemble approach not only generates high-quality discrete clustering results but also enables soft clustering to characterize regions of biological continuity and quantifies clustering uncertainty at the single cell level.
Comprehensive evaluation across a large collection of diverse single cell datasets demonstrates ESCHR's superior performance and generalizability compared to existing methods. We further showcase the method's capabilities through in-depth analysis of two distinct applications, mapping the connectivity and intermediate transitions between handwritten digits (MNIST) and between hypothalamic tanycyte subpopulations. In both cases, ESCHR successfully identified canonical discrete groups while revealing meaningful continuous structure between them. This unified approach to capturing both discrete and continuous aspects of data structure, combined with transparent uncertainty quantification, represents a significant advance in our ability to generate hypotheses from single cell data.
By emphasizing generalizability, robust performance, and interpretability while eliminating the need for manual parameter tuning, ESCHR provides a powerful framework for extracting biological insights from the growing volume of single cell data. This work contributes not only practical tools for the single cell research community but also advances our conceptual approach to understanding cellular heterogeneity.
PHD (Doctor of Philosophy)
Single Cell, Clustering
English
2025/04/28