Understanding Dataset Vulnerability to Membership Inference Attacks

Author: ORCID icon orcid.org/0000-0001-7707-6466
Moore, Hunter, Systems Engineering - School of Engineering and Applied Science, University of Virginia
William, Scherer

Recent efforts have shown that training data is not secured through the generalization and abstraction of classification algorithms. Initial studies of membership inference attacks focused on the extraction of information pertaining to training data sets through the exploitation of over-fit algorithms via comparison of outcome confidence levels. Successful attacks have been accomplished both through black-box and white-box attacks, even on more generalized models. More recently, several research groups have shown increased accuracy of membership inference attacks through pragmatic approaches which focus on more vulnerable sub-groups as opposed to indiscriminate attacks which seek information on the overall training data set. The present dissertation accomplishes two overarching goals. First, it advances the understanding of vulnerable datasets. It has been shown that minority populations of data are vulnerable to attack and that imbalanced and low-entropy datasets result in higher membership inference attack success. This includes imbalance in observations, features, or labels. This work evaluates this imbalance to recognize its contribution to the underlying vulnerability of a dataset to membership inference attack. Based on this understanding, the second part of this dissertation explores the intelligent selection of training observations and features in a manner which reduces the population of vulnerable records while maintaining the integrity of the model to generalize well to the classification population and capture required variance. This was accomplished through NearMiss undersampling, conditional tabular generative adversarial network based oversampling, correlation-based feature selection, and manifold theory based feature selection. This study provides a vulnerability metric for classifying datasets and their class-based subsets as vulnerable or not to membership inference attack and explores hardening strategies based on these findings. This not only adds to current understandings of security and vulnerability to these types of attacks but provides an increased understanding into how machine learning algorithms develop their mappings between observation/feature sets and their labels.

PHD (Doctor of Philosophy)
All rights reserved (no additional license for public reuse)
Issued Date: