Efficiently Exploring Multilevel Data with Recursive Partitioning

Author:
Martin, Daniel, Psychology - Graduate School of Arts and Sciences, University of Virginia
Advisor:
Von Oertzen, Timo, Department of Psychology, University of Virginia
Abstract:

There is an increasing number of datasets with many participants, many variables, or both, found in education and other areas that commonly have complex, multilevel data structures. Once initial confirmatory hypotheses are exhausted, it can be difficult to determine how best to explore these datasets to discover hidden relationships that could help to inform future research. Typically, exploratory data analysis in these areas are performed with the very same statistical tools as confirmatory data analysis, leading to potential methodological issues such as increased rates of false positive findings. In this dissertation, I argue that the utilization of a data mining framework known as recursive partitioning offers a more efficient means to perform exploratory data analysis to identify variables that may have been overlooked initially in the confirmatory data analysis phase. By adopting such a non-parametric approach, researchers can easily identify the extent to which all variables are related to an outcome, rather than rely on null hypothesis significance tests as a strict dichotomization of whether a given variable is "important" or "unimportant." This dissertation evaluates the feasibility of using these methods in multilevel contexts commonly found in social science research by using both Monte Carlo simulations and three applied datasets. Based on these results, a set of best practices was constructed and disseminated via a small workshop given to applied researchers. Feedback from these researchers helped lead to a publicly available tutorial and R package to assist others interested in adding this technique to their own statistical toolbox.

Degree:
PHD (Doctor of Philosophy)
Keywords:
random forests, data mining, exploratory data analysis
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2015/06/12