Online Archive of University of Virginia Scholarship
Variable Selection on Compositional Data Based on Variable Deletion184 views
Author
Perez-Suarez, David, Statistics - Graduate School of Arts and Sciences, University of Virginia
Advisors
Zhou, Jianhui, Statistics, University of Virginia
Abstract
Compositional data consists of proportions or percentages of compositions, which are usually positive vectors, with the relevant information being the ratios between their components. The unique feature of compositional data is that the observed values of compositional variables sum to 1 for each subject, and this feature makes the selection for informative variables challenging when dimensionality is high since many of the existing variable selection methods cannot accommodate this data structure. Compositional data appears in a wide range of applications such as geology, consumer demand analysis, forensic science, etc., and an effective variable selection method for such data is highly desired. In this work, we developed a variable selection method for compositional data in a linear regression model. The developed method is based on the deletion of the subsets of the variables and the corresponding changes in the coefficient of determination. The deletion method was computed efficiently. The numerical performance of the developed method is satisfactory in simulation studies. This variable selection method for compositional data can also be generalized for more complicated models.
Degree
PHD (Doctor of Philosophy)
Language
English
Rights
All rights reserved (no additional license for public reuse)
Perez-Suarez, David. Variable Selection on Compositional Data Based on Variable Deletion. University of Virginia, Statistics - Graduate School of Arts and Sciences, PHD (Doctor of Philosophy), 2023-11-27, https://doi.org/10.18130/f49s-6q78.