Informative missingness in recommender system
Jin, Haiyun, Statistics - Graduate School of Arts and Sciences, University of Virginia
Tang, Xiwei, AS-Statistics AS-Statistics, University of Virginia
Recommender systems have been extensively adopted in a variety of areas such as electronic commerce, social media platforms, and content generators for individualized prediction and recommendation. Data sparsity is one of the main challenges in this topic as usually only a very limited number of user-item interactions are observed, resulting in a large proportion of missing data. Since users' ratings to items may depend on underlying user-specific preferences or item-specific characteristics, the missing data pattern in recommender systems is typically informative. Most existing recommender systems fail to account for this crucial information by assuming a missing completely at random mechanism. In this thesis, to address this challenge, we developed new recommender system models to utilize the informative missing data pattern in two different directions. In the first one, we propose a multi-layer matrix factorization scheme by leveraging extra layers that incorporate the informative missingness through embedding techniques. The new model combines the strength of matrix factorization and collaborative filtering on both user and item dimensions. Furthermore, to improve the algorithm's scalability, we present effective sampling strategies based on random walks in obtaining the embeddings of users and items with high dimensionality. Both simulation studies and real data applications illustrate the outperformance of the proposed model with significantly better predictive power and great computational scalability. In the second one, we explore incorporating the missing information into the estimated propensity scores and construct an adjusted prediction on user-item ratings based on the association between the missing data mechanism and observed ratings. Numerical studies indicate a reasonable local improvement with the introduced missing-based adjustment, especially for users with few observations.
PHD (Doctor of Philosophy)
Missing Data, Embedding, Latent Factor Models, Propensity Score, Recommender System