Abstract
My two thesis projects both cover the same problem: what people and organizations are
supposed to do with AI systems when the evidence is shaky, the feedback comes in slowly, and
the outputs can steer real decisions off course. This problem is sociotechnical in nature because it
is not just about whether a model is smart on paper. It is also about who gets to decide what
counts as good evidence, who gets to call something correct, how much uncertainty people are
willing to tolerate, and who ends up holding the bag when the system is wrong. My technical
project looked at this issue inside a sales prospecting system that had to be built before there was
reliable outcome data. My STS project looked at it from the side of generative AI hallucination,
where truth and reliability get shaped by benchmarks, governance, and the rules institutions use
to mitigate risk.
The technical project focused on designing and building an AI-driven sales prospecting
system for Zbooni, a UAE-based B2B software company, in a setting where there was not
enough historical conversion data for ordinary supervised learning. This mattered because the
company was putting substantial time into manual prospecting while still not really knowing
which leads were worth chasing and which ones would just turn into dead ends. The project drew
on stakeholder interviews, requirements analysis, and a pilot deployment of a discovery,
qualification, scoring, and outreach system called LeadFlow, which we built to address that
problem. Rather than pretending the system already knew ground truth, we started with
stakeholder heuristics, translated them into measurable features, and built the pipeline around
custom scrapers plus the OpenAI API for scoring and personalized outreach drafts. Just as
importantly, the enrichment process did not take every scraped signal at face value. Instead, it treated
confidence as conditional, giving more weight to leads when multiple sources lined up
and flagging uncertain cases for manual review. In pilot use, LeadFlow generated many enriched
leads, flagged a substantial share as promising, and seemed affordable enough to keep testing.
The key takeaway was not that the machine had solved manual sales prospecting. It was that in a
zero-baseline environment, it is possible to build a usable AI-driven system without acting like
weak evidence is stronger than it really is.
The STS project researched hallucination as more than just a technical glitch. It asked
why disagreement over hallucination keeps hanging around even as models improve and
benchmark scores go up. The central argument was that hallucination is a sociotechnical problem
about correctness, responsibility, and truth, not just a model malfunction existing in isolation
from society. To study that question, I analyzed benchmark papers, governance frameworks,
vendor system cards, and service terms. My STS paper argues that benchmarks do not simply
measure hallucination; they also shape which kinds of failure get treated as important. I found
that once LLMs enter institutional settings, responsibility for catching errors often shifts toward
users and organizations through review procedures, documentation requirements, and
disclaimers.
Together, these two projects help address that larger problem of safe AI use, but only to a
certain degree. The technical paper offers one practical way to build an AI-driven system when
reliable labels are scarce, but it does not yet show that its scoring rules fully deserve trust or that
its projected outcomes will hold up over time. The STS paper helps explain why that limitation
matters. In low-evidence settings, trust is never just about model performance. It is also shaped
by benchmarks, institutional practices, and by who is expected to catch the machine when it is incorrect.
Both projects suggest that people cannot wait forever for perfect certainty before
acting, but they also should not mistake neat procedures or plausible-looking scores for truth.
I would like to thank Professor Caitlin Wylie for her guidance on the STS side of this
thesis and for pushing me to make the argument clearer and more grounded. I am also grateful to
the faculty and teammates who helped shape the technical project, especially the Zbooni team
and the students who worked on LeadFlow with me. I also want to give special appreciation to
Maddie A. Priebe for her role in the capstone process and for making the collaborative side of
undergraduate research especially memorable from beginning to end.