Building Computer-Aided Diagnostic Models for Biopsy Images Using Minimally Curated Datasets

Pulido, Joseph Vincent, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Brown, Don, DS-Data Science School, University of Virginia

Convolutional neural networks (CNNs) perform well on many biopsy datasets and show promising signs of becoming part of the process to augment physicians' workflow to diagnose diseases. Training these medical models typically involves the tedious task of curating large-scale medical datasets which involves activities like collecting, storing, and annotating data. In general, these data curation tasks are usually the most time consuming and resource intensive portion of the model development processes. This cost is exacerbated in the medical field where 1) annotation labor costs are high, and 2) disease samples are scarce. This body of work aims to reduce data curation costs by examining and developing methods that aim to alleviate the problems of building CNNs trained on partially annotated or imbalanced datasets. First, I will examine the performance of semi-supervised methods that increase prediction performance by leveraging a large unlabeled dataset, along with a smaller labeled dataset. I assess the impact of having noisy samples in the unlabeled data which is common in biopsy tissue data. To decrease the effects of this noise, I examine the effects of applying semi-supervised co-teaching methods. Next, I analyze the performance of class imbalance methods on the task of grading cancerous biopsies. I show that state-of-the-art class imbalance methods perform sub-optimally due to rare 'polarized features' inherent in many biopsy cancer grading tasks--where cancer patterns manifest only at the tail-end of the cancer progression. By improving these two areas of research, this work aims to decrease the cost of curating biopsy datasets and promote the use of CNNs on many medical tasks under resource constrained settings.

PHD (Doctor of Philosophy)
Biomedical Imaging, Semi-supervision, Class Imbalance
All rights reserved (no additional license for public reuse)
Issued Date: