Toward Practical Relational Keyword Search Systems
Coffman, Joel, Computer Science - School of Engineering and Applied Science, University of Virginia
Weaver, Alfred, Department of Computer Science, University of Virginia
The amount of information in the world is increasing exponentially. Keyword search has proven to be an effective method to discover and retrieve information online as evidenced by the success of Internet search engines. Unfortunately, many common information management systems do not support the familiar keyword search interface that people now expect. Web sites, corporations, and governments all use relational databases to manage information, but keyword search in relational databases is difficult due to data transformations that eliminate redundancy and ensure consistency. Relational keyword search enables users to retrieve information and to explore the relationships among that information all via a familiar interface.
Although a decade has passed since keyword search in databases became a hot topic for academic researchers, little progress has been made in the interim. In particular, no systems have appeared outside the academic community despite a long-standing promise to revolutionize the way people interact with information. This dissertation addresses the challenges inherent in transitioning relational keyword search techniques from the computer science community to practical systems that can be deployed against existing data repositories. A key contribution of this research is an extensive benchmark specifically designed to evaluate relational keyword search techniques. Extensive empirical experiments both identify why existing search techniques cannot handle existing data repositories and identify areas for future research in this field. Improvements to relational keyword search come in the form of two novel ranking schemes that significantly improve search effectiveness. The first explicitly enforces users' preferences regarding the order of search results. The second uses machine learning to weight the various scoring factors that have been proposed to date in the literature, and analyzing their importance indicates a number of factors that can be excluded without sacrificing search effectiveness. This dissertation also examines key issues related to the evaluation of proposed search techniques that derail many existing evaluations from accurately reflecting real-world retrieval tasks. This work bridges the gap between academic research and keyword search techniques that are ready to be deployed in real-world environments.
PHD (Doctor of Philosophy)
keyword search, relational databases, benchmark, empirical evaluation, ranking, search log analysis
All rights reserved (no additional license for public reuse)