Learning by Exploration with Information Advantage

Author: ORCID icon orcid.org/0000-0003-3918-6925
Wang, Huazheng, Computer Science - School of Engineering and Applied Science, University of Virginia
Wang, Hongning, EN-Comp Science Dept, University of Virginia

Learning is a predominant theme for any intelligent system, humans, or machines. Moving beyond the classical paradigm of learning from past experiences, e.g., offline supervised learning from given labels, an intelligent learner needs to actively collect human feedback to learn from the unknowns, i.e., learning through exploration. The growing need for interactive intelligent systems in practice, such as recommender systems, smart homes, conversational systems and self-driving cars, urges the research in the learning by exploration paradigm. The thesis focuses on this key ingredient in interactive online learning problems, with the goal of designing algorithms that efficiently interact with and learn from human feedback in real-world environments.

There are several challenges in realizing this goal, including 1) huge exploration space, which is due to the large number of candidate actions and agents (users) and is typical in a practical recommender system; 2)missing information, where informative information regarding the actions, users and the environments may be unavailable to the intelligent system; and 3)privacy and security concerns, which requires a trade-off between the intelligent system's efficiency and its privacy and security guarantee. The key insight to overcome the challenges is that the information advantage, i.e., leveraging additional information regarding the structure of the problem such as social connectivity and context structure, offers a unique opportunity to develop advanced intelligent systems.

Based on this insight, we develop efficient and trustful interactive online learning systems in this thesis from three perspectives: 1) sample efficient online learning with explicit structural information; 2) efficient exploration in implicitly structured environments; and 3) privacy and security in online learning. Our study provides a deep and thorough understanding of the benefit of leveraging structural information as an advantage and extend the application of bandit learning algorithms to practical scenarios. Rigorous theoretical analysis and extensive empirical evaluation validated the approaches' applicability in various contexts and applications. By harnessing the power of information in exploration, the proposed research has been applied to high-impact real-world problems such as interactive recommendation, search result ranking, and social influence maximization.

PHD (Doctor of Philosophy)
Online Learning, Multi-armed Bandits, Recommender Systems
Issued Date: