Abstract
It is uncommon for scalability to be high on the priority list for organizations starting out; having a working product at all is more important than the best one. However, the design patterns used for their ease of implementation are rarely replaced, leading to significant issues later as the organization grows. My technical report covers the issue of non-standardized data and the difficulty of accessing specific information, which is not a large concern when creating early documents. This was resolved by designing a chatbot that can work with the large amount of unstructured data needed for accurate responses. A lack of scalability can also be seen in the architecture of many firms, as they use increasing amounts of external services without consideration for potential failures. My STS paper researches our unstable digital architecture and how the ease of not implementing redundancy can combine with large market concentration to create large-scale failures stemming from one service going down.
My technical work tackled the issue of data accessibility within an organization due to the quantity and format of the data. Large amounts of info were stored in difficult to locate and query documents. Other data was stored within a database, which required technical knowledge to access, further hindering accessibility. By researching methods of combining various data models with LLMs, my team and I found a solution involving an AI chatbot for organizational use. To prevent it from providing incorrect data, data was provided from a knowledge graph generated from highly relevant parts of the existing database and documents. User input was converted into a database query, and its results would then create an output. The report found that this chatbot reduced the difficulty of finding important data although its latency was quite high due to its multi-step nature.
The STS paper investigated large scale outages caused by the usage of external services as dependencies. Many firms will have at least one dependency that is critical to operations. Furthermore, many services are dominated by one specific provider. This combination leads to service outages cascading to its customers, creating society-wide damage from a single point of failure. I examined multiple perspectives on high-impact failures and market analyses to find that the dominance of specific providers is mainly caused by a feedback loop caused by network effects. As providers grow, their value to consumers also does due to more access to data, funding for more resilient infrastructure, and an increase in scale. Resolving this should be done by reducing barriers to entry for firms, although other preventative measures are necessary. I found that implementation of redundancy for critical services can work in tandem with market concentration reductions to minimize large scale outages.
The technical work produced a solid proof of concept for the use of knowledge graphs and text-to-query approaches to improve the accessibility of data, but could not be fully realized due to time constraints. Further work is needed in validation and optimization of the model and knowledge graph generation. The STS paper identified potential leading causes of the high market concentration and low failure tolerances that make large-scale outages prevalent and offers a framework for future reform to follow. However, it was unable to provide concrete suggestions for specific regulations due to the large scale required for meaningful reform. To create reasonable reform, future research must be done to identify potential regulations and feasible implementation methods.
I would like to sincerely thank my professors Caitlin Wylie and Rosanne Vrugtman for their assistance in completing this portfolio. Their guidance and experience were greatly appreciated and valuable from the opening stages of writing to the final revisions. I would also like to thank my capstone team, whose skills and leadership proved invaluable for reaching our results. I lastly wish to thank my classmates for their feedback on these papers.