Interactive Online Learning with Incomplete Knowledge
Wu, Qingyun, Computer Science - School of Engineering and Applied Science, University of Virginia
Wang, Hongning, EN-Comp Science Dept, University of Virginia
The past decades have witnessed a prominent trend of adopting intelligent systems, such as recommendation systems and smart homes, into ordinary people’s daily life. One key characteristic of such systems is the need
for online sequential decision making: decisions have to be made when the learning agent only has incomplete knowledge about the world/environment. The consequences of such decisions will, in turn, contribute to the
data the agent can collect, forming an interactive feedback loop between the agent and the world/environment. This makes conventional offline training based machine learning methods incompetent and urges us to move from the passive learning paradigm to a more interactive and proactive one. It motivates the research of
developing interactive online learning solutions, such as contextual bandits, and more generally reinforcement learning, for real-world intelligent systems.
Interactive online learning studies how an agent can interact with an environment to learn a policy that maximizes expected cumulative rewards for a task. In a real-world intelligent system, the learning agent faces environments that consist of human users. This brings at least two significant challenges in developing
interactive online learning solutions. First, to capture user heterogeneity, personalized learning solutions are needed. However, the sparsity of each individual user’s observation, especially for new users, makes the learning process very slow. Second, many real-world systems are highly dynamic, which is reflected in the
fact that users’ preferences change over time due to various internal or external factors, and item popularity varies due to fast-emerging events and contents. Failing to model such dynamics may lead to sub-optimal decisions.
These fundamental challenges motivate research in this dissertation. We conquer the first challenge by leveraging the existence of dependency among the users. Specifically, we develop a series of collaborative contextual bandit learning solutions in which information can be propagated through explicit or implicit
dependency among users. This information propagation helps conquer the data sparsity issue and accelerate the personalized learning process. Rigorous theoretical guarantees are developed, which reveals the benefit of collaboration in the learning process when user dependencies do exist. We conquer the second challenge by moving beyond a commonly used but restrictive stationary environment assumption to a more a realistic non-stationary one. We develop a suite of novel and theoretically sound contextual bandit solutions that
automatically detects the potential changes in the environment and adapts its decision making strategy accordingly. Solutions developed this dissertation have been applied to a broad spectrum of recommendation
systems, showing their great practical potential.
PHD (Doctor of Philosophy)
Interactive online learning