Neural Model Interpretability for Natural Language Processing

Chen, Hanjie, Computer Science - School of Engineering and Applied Science, University of Virginia
Ji, Yangfeng, EN-Comp Science Dept, University of Virginia

Natural language processing (NLP) has illuminated the world by enabling understanding, communication, and interaction between computers and humans. NLP has a wide range of practical applications in various fields, such as processing and extracting information from unstructured data, translating different languages to support communication, and building dialog systems or virtual assistants for social good. The development of NLP is driven by neural network models. Over the past decades, NLP models have grown ever larger and more sophisticated and demonstrated impressive learning abilities to handle a variety of tasks. On the other hand, the interpretability of NLP models has diminished due to the incremental complexity of neural networks and the limited access to their inner working or training data. The lack of interpretability has raised much concern about the trustworthiness and reliability of NLP models in real-world applications. Besides, the black-box nature of neural network models has hindered humans from understanding them, finding their weaknesses, and avoiding unexpected failures.

In this dissertation, I cultivate neural model interpretability for trustworthy NLP. Specifically, I integrate interpretation techniques into model development, covering three main phases of a model life cycle—training, testing, and debugging. During training, I build interpretable models by designing learning strategies to make model prediction behavior transparent and reasonable. In the testing phase, with limited access to a black box model, I develop explanation methods to explain the model decision-making on each test example. I evaluate explanations (e.g., informativeness), as complementary to traditional evaluations on predictions (e.g., accuracy), to understand model prediction behavior. Finally, I diagnose and debug models (e.g., robustness) through the lens of explanations and develop solutions to improve them. My research has the potential to benefit NLP and AI developers, providing them with a better understanding of neural network models and helping them build trustworthy and reliable intelligent systems.

PHD (Doctor of Philosophy)
Natural Language Processing, Neural Networks, Interpretability
Issued Date: