Generalized Mixed Models with Mixture Links for Multivariate Zero-Inflated Count Data

Wang, Lijuan, Department of Psychology, University of Virginia
Nesselroade, John, Department of Psychology, University of Virginia

Count data with excessive zeros are often observed in substance use or problem behavior research. When multiple items which could produce zero-inflated count data are used to measure a construct (e.g., substance use), a traditional way to estimate individuals' trait levels of the construct is to form composite scores of the items. However, the main disadvantage of this method is that the composite scores' distribution is negatively skewed and the weight of each item is usually simply set as 1. In this study, I introduce a generalized mixed model with mixture links such as a logit link and a log link to estimate individuals' trait levels and investigate the psychometrics properties of the multiple items for multivariate zero-inflated count data. Simulation studies are conducted to assess the possible influence of factors such as sample size, number of items, proportion of zeros, and estimation method on the estimation of the proposed model and to compare the performance of the proposed model with that of previously employed alternative methods. Application of the model is illustrated by analyzing the substance use data from the NLSY study. The simulation results showed that the proposed model can recover the true trait levels more accurately than the selected alternative methods and the estimation of the person trait levels is more accurate with more items and lower proportions of zeros. Regarding the accuracy of the item parameter estimates, middle proportions of zeros, larger sample size, and more items provide more accurate estimates under the tested conditions. When sample size was larger than 2000, the item parameters were estimated accurately in most conditions. The simulation results also showed that both marginal maximum likelihood estimation method (MMLE) and Bayesian estimation (BE) methods II can provide accurate item parameter estimates with large enough sample sizes. Each estimation method had its own advantages and disadvantages in computation time and convergence rate. The empirical results included many outcomes that were not obtained using previous methods, especially in investigating the psychometric properties of the multiple substance use items from both propensity and level perspectives. Limitations and future directions of this study are discussed.

Note: Abstract extracted from PDF text

PHD (Doctor of Philosophy)
All rights reserved (no additional license for public reuse)
Issued Date: