Non-Stationary Contextual Multi-Armed Bandit with Application in Online Recommendations

Hassan, Muhammad, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Beling, Peter, Department of Systems and Information Engineering, University of Virginia

The only constant in the online social behavior is forever changing user intent and preferences. These changes could be inspired by myriad of factors but still have an overall trend e.g. todays popular news will be stale tomorrow etc. Such patterns are especially noticeable in viral trends where an immediate gain in popularity is followed by gradual lost of holistic interest. In this study we focus on design of a recommendation system which accounts for this non-stationary behavior of declining popularity.
Contextual multi-armed bandit (contextual MAB) is a popular framework for learning user behavior and personalized recommendations based on the past behavior. Fundamentally MAB solves the exploitation-exploration dilemma which aims at minimal guided experimentation required to gain certain level of confidence in its recommendation. Traditionally contextual MABs (e.g. LinUCB) have been used to model stationary user behavior that is not appropriate for the target environment where LinUCB can accumulate linear regret. Here we extend this LinUCB type algorithm to model a decaying environment. We present three algorithms with variable levels of specificity in the assumptions they make about the non-stationary environment. We show by simulation the effectiveness of our methods which illustrates the usefulness of modeling meta-trends in user behavior.

MS (Master of Science)
Bandit, Machine Learning, Multi Armed Bandit, Temporal trend, Non-stationary, Contextual Bandit, Reinforcement Learning
All rights reserved (no additional license for public reuse)
Issued Date: