An Evaluation of Fit Statistics in the Identification of Spurious Classes in Finite Mixture Models
Hull, Michael, Education - Curry School of Education, University of Virginia
Meyer, Joseph, Curry School of Education, University of Virginia
Finite mixture modeling is a popular tool for model based clustering. Research has shown that some data conditions or model misspecification can lead to the identification of spurious classes. However, the majority of research has focused on identifying the correct number of classes when the true number of classes is two or greater rather than the most basic hypothesis of a true one class distribution. The purpose of this study is to more fully explore the extent to which finite mixture models identify spurious classes when the true number of classes is one.
Data were simulated to form single-component normal and nonnormal distributions. Mixture models with one to four components were fit and log likelihood based, classification based, and likelihood ratio based fit statistics were employed to identify the best fitting model. The eleven fit statistics evaluated 72 analysis by data conditions with 250 replications when using multivariate normal and multivariate skew normal component distributions and 125 replications when using the more computationally intensive multivariate skew t component distributions.
The results showed that type of fit statistic, degree of data nonnormality, and type of component distribution accounted for the most variance in identifying the correct model. The ICL-BIC outperformed all other fit statistics and as data nonnormality increased, so did the identification of spurious classes. However, allowing the shape of component distributions to vary reduced spurious class identification and, when paired with the best performing fit statistics, eliminated the identification of spurious classes to within a reasonable statistical probability.
This study did not examine the degree of inaccuracy in identifying the correct model – i.e. examining which model was preferred rather than identification of the correct model. Additionally, this study did not examine conditions where the correct model had more than one class and nonnormal components were used to fit the models. Further, this study made no attempt to evaluate other statistical considerations in the identification of the correct model such as the separation of class means and the proportion or number of cases in the classes.
PHD (Doctor of Philosophy)
finite mixture model, spurious classes, fit statistics, nonnormal component distribution, simulation
This research was partially funded by a Curry Doctoral Student Dissertation IDEAs (Innovative, Developmental, Exploratory Awards) Grant sponsored by the Curry School Research and Development Fund.
All rights reserved (no additional license for public reuse)