Maintaining Retrieval Effectiveness in Distributed, Dynamic Information Retrieval Systems
Viles, Charles L., Department of Computer Science, University of Virginia
French, James C., Department of Computer Science, University of Virginia
PHD (Doctor of Philosophy)
Traditional information retrieval (IR) techniques were developed under the tacit assumptions of static, centralized archives of documents. Advanced techniques invariably use information derived from the entire collection in an effort to produce high-quality responses to user queries. In dynamic, distributed information environments these assumptions are clearly not met. Heretofore easily obtainable collection wide information (CWI) may be unavailable to some or all member sites in a distributed document archive, so some degree of incompleteness or inconsistency must be tolerated. In this dissertation, we present a rigorous empirical study investigating how allowing the view of CWI to drift from rigorously defined values influences retrieval effectiveness. We give a generic model for searching a document collection that allows for the use of CWI derived from a subset of the collection. Within this model, we identify two realistic scenarios where the use of subset-derived collection statistics is likely. The first scenario involves distributed document databases and the second involves ad-hoc search in dynamic document databases.
All rights reserved (no additional license for public reuse)