Relationship Material: Using Machine Learning to Identify Variables of Importance that Best Predict Lifestyle Choice in the Add Health Longitudinal Dataset

Author: ORCID icon
Domiteaux, Matthew, Psychology - Graduate School of Arts and Sciences, University of Virginia
Domiteaux, Matthew, Psychology, University of Virginia

Machine learning has been increasing in popularity due to its potential to provide major insights into a variety of complex topics. However, the applications of these techniques to the study of psychology is not yet widespread. This study seeks to use a specific type of supervised machine learning - multi-class classification - to predict who marries, cohabitates, or remains single by young adulthood (i.e., ages 24 to 32). This study applied machine learning to an extensive dataset, the National Longitudinal Study of Adolescent to Adult Health (Add Health), which is a rich, longitudinal survey that includes a diverse sample of over ten thousand participants and several thousands of variables collected in five waves spanning two decades. Variables within Add Health tap dozens of psychological and behavioral constructs that may serve as predictors of lifestyle choice. Broadly stated this study examined: 1) How well can marriage, cohabitation, and singlehood be predicted within the Add Health dataset?; 2) Do certain topic constructs in Add Health such as substance use or personality influence these predictions more so than others among the widest range of predictors possible?; 3) Are there variables from earlier in the lifespan that can accurately predict outcomes that occur later in life up to young adulthood? In order to answer these questions this study applied and compared the results from multiple machine learning models using a sophisticated, multi-model, cross-validation approach. The major implications of this study are twofold: 1) uncover which variables are most important and predictive when it comes to lifestyle choice in young adulthood; and 2) provide a template for using machine learning in the context of large datasets that can be applied to other research questions and outcomes (e.g., body mass index [BMI], intelligence).

PHD (Doctor of Philosophy)
relationships, marriage, cohabitation, singlehood, machine learning, predictive analysis, Add Health
Issued Date: