Vector Search: Preparing a Production Codebase for a New Search Backend; Internal Site Search: Linking Underwhelming Performance to Competing Interests

Author:
Kouzel, Maxfield, School of Engineering and Applied Science, University of Virginia
Advisors:
Neeley, Kathryn, EN-Engineering and Society, University of Virginia
Seabrook, Bryn, EN-Engineering and Society, University of Virginia
Vrugtman, Rosanne, EN-Comp Science Dept, University of Virginia
Abstract:

The work done in this portfolio focuses on how online search is successfully implemented over comparatively small collections of data and what obstacles must be overcome to achieve this success. The vector search paradigm described in the technical report is a novel search methodology enabled by neural networks that overcomes many technical challenges in improving search performance, and the problem frame and network analysis presented in the STS research paper highlights competing interests of stakeholders in internal site search that restrain search success. Both lines of inquiry highlight ways that search can be improved, but also the tradeoffs that come with each change.

In the technical report, the search team of a publicly listed software company needed to adapt their system to use a vector search backend in addition to a traditional lexical one to better respond to client queries. To add this functionality to their existing system, the company must update two key parts of their codebase: their pipeline to process content updates and their gateway to use their machine learning models. The new infrastructure utilized Apache Kafka for update buffers, AWS and GCP to host new Java-based indexing servers, and a RESTful API for the new gateway. After turning on the new feature, over 100,000 requests we successfully served in under an hour. With the new infrastructure set up, the company can serve vector search to customers and refine the technology to best utilize it for their operations.

The STS research paper describes internal site search, which is the platform that searches all the information contained by a single site, organization, or system and is an essential part of intuitive, accessible website design. However, it has garnered a reputation of poor performance among users, particularly in comparison to web search. As new search paradigms enabled by neural networks are developed, major technological change is on the horizon for internal site search as an industry. While technical advancement may alleviate some of the problems causing internal site search performance to suffer, it should not be taken for granted that all the issues within internal site search are of a technical origin, and a more detailed study of how users, organizations, and other stakeholders interact with internal site search is merited to isolate the root causes of these performance problems. This analysis uses actor-network theory to identify the most prominent actors within the internal site search space, investigate their respective motives, and understand how tensions between actors lead to consequences for the internal site search system as a whole. It finds that the unique dynamics among actors in internal site search force the main architects of internal site search platforms, i.e. host organizations that own site data and search providers build the platforms, to make tradeoffs that are not present in platforms for other types of search like web search. The resulting picture of internal site search is one that has great potential but is held back by competing interests among stakeholders.

There is a synergy between these two projects in the perspectives they view the search system through. While working on the technical report, I focused, unsurprisingly, entirely on the technical aspects and challenges in search. It required me to deeply understand the algorithms, communication protocols, and hardware structure of the search architecture, a complicated but impersonal task. The work I did on the STS research paper filled in this human gap, as I was able to investigate the question of who just as deeply as what. This helped bring life into the search system, complementing the technical work I had done. At the end, I had explored the search system more thoroughly than a single perspective would have allowed and was able to gain a deeper understanding of search because of it.

Degree:
BS (Bachelor of Science)
Keywords:
internal site search, search engines, vector search, actor-network theory
Notes:

School of Engineering and Applied Science
Bachelor of Science in Computer Science
Technical Advisor: Rosanne Vrugtman
STS Advisors: Kathryn Neeley, Bryn Seabrook

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2024/08/15