A Human-in-the-Loop Methodology For Applying Topic Models to Identify Systems Thinking and to Perform Systems Analysis
Boyer, Ryan, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Scherer, William, Department of Systems and Information Engineering, University of Virginia
Systems thinking characterizes the paradigm needed to effectively design, analyze, maintain, and utilize systems. Prior work has shown that there is a language of systems thinking and that its presence can be quantified within text using supervised learning methods. Building on this foundation, this work presents an unsupervised, human-in-the- loop methodology that utilizes topic models to facilitate the identification of systems thinking within a corpus of unread documents. This methodology is then expanded for use in analyzing systems themselves with methods of visualization, summarization, and provision of research direction. The novel aspect of the methodology is the seeding of the corpus; the user encourages the topic model to reveal desired information in documents by seeding the corpus with several documents that demonstrate, discuss, or exhibit the information desired. For identifying systems thinking, documents exhibiting systems thinking are added and a systems thinking topic is encouraged to form; each document’s topic proportion within this topic is used as a proxy measure for the potential of systems thinking. This method for identifying systems thinking is fundamentally an explorative methodology (not predictive or prescriptive) and requires no manual grading of documents, which makes it significantly faster than previous methods. A Tukey test on a graded corpus reveals that the top echelon of strong systems thinking papers have significantly higher mean topic proportions in the systems thinking topic than lower graded papers. Additionally, a case study on a corpus of Army documents related to the development, character, and management of soldiers demonstrates the methodology’s effectiveness in overviewing a system, in providing research direction, and identifying systems thinking within a specific domain. Finally, the methodology is used to present analysis of the past nineteen years of transportation research, demonstrating that transportation research is growing more holistic by focusing on environmental, physical, and societal health. Additionally, this analysis shows that the growth of traditional infrastructure research (construction, bridges, pavement, and roads) is significantly outpaced by research in these more holistic areas. Overall, topic models and the human-in- the-loop methodology demonstrate value by pairing a computer’s data processing and structuring power with human intuition’s ability to make associations and process abstract concepts.
MS (Master of Science)
Systems Thinking, Systems Approach, Systems Engineering, Topic Models, Latent Dirichlet Allocation, LDA, Machine Learning, Unsupervised Learning, Bayesian Methods, Natural Language Processing, Text Analytics
U.S. Army Training and Doctrine Command Analysis Center, Monterey, California, USA