Abstract
Large Language Models have become powerful tools for academic work. They can help with research tasks such as coding, paper summarization, and literature review by generating and organizing language efficiently. However, LLMs frequently hallucinate, producing claims that appear convincing but are not supported by facts. In academic settings, this problem is especially serious because academic credibility depends on evidence. A generated answer may sound correct while still containing claims unsupported by existing facts.
In order to fully investigate this issue, this thesis examines LLM hallucination as both a technical problem and a sociotechnical problem. The central concern is epistemic reliability: how can we use LLMs in academic tasks with fidelity? The technical report addresses hallucination detection in graph-grounded question answering through TopoGuard, a training-free framework for checking whether generated claims are supported by graph evidence. The STS thesis studies how hallucinations reshape academic research life, especially how researchers decide when to trust, verify, or reject LLM-generated content. Together, the two projects show that the hallucination problem appears as both a technical and sociotechnical problem. It changes how researchers work, how verification labor is distributed, and how academic authority is maintained.
The technical report, TopoGuard: Training-Free Hallucination Detection in Graph-Grounded Question Answering, focuses on knowledge graph question answering and GraphRAG systems. In these settings, an LLM is expected to answer questions using a graph as the underlying evidence source. Graphs are useful because many factual claims can be represented as subject–relation–object triples, and graph paths can make evidence more traceable than ordinary text retrieval. However, graph grounding does not automatically prevent hallucination. A model may mention entities that exist in the graph while asserting the wrong relation between them. It may also rely on a graph path that is structurally plausible but semantically unsupported.
TopoGuard detects hallucination by comparing generated claims against the reference graph. The system first decomposes an answer into atomic claims, normalizes each claim into a graph-style representation, links the entities to graph nodes, and retrieves graph paths between those entities. It then evaluates support at three levels. The first level uses topology-only structural consistency, including entity presence, reachability, shortest-path distance, and local graph disruption. This level is useful for detecting missing entities, disconnected entities, and simple existence errors. The second level adds embedding-based relation similarity. It compares the relation in the generated claim with graph relation labels or path relations, which helps with some relation-level hallucinations. The third level uses LLM-based semantic reasoning over retrieved graph paths. This level asks whether the graph evidence actually supports the generated claim, including relation meaning, polarity, conjunction, negation, and multi-hop reasoning.
The experiments show that graph topology provides a useful structural signal, but topology alone is incomplete. A graph may connect two entities through some path while failing to support the specific relation asserted in the generated answer. This is one of the main failure modes in graph-grounded hallucination: a claim can preserve plausible graph structure while flipping the relation meaning. Embedding-based detection improves some cases, but it can confuse semantic relatedness with factual equivalence. For example, “born in,” “lives in,” and “citizen of” may be close in embedding space, while they express different facts. LLM-based semantic reasoning performs best overall because it can interpret whether the retrieved graph path actually entails the claim. At the same time, the GraphRAG stress test using NovelQA and LightRAG shows that verification quality depends on graph construction quality. When the graph is automatically built from long text, important event-level relations may be missing, which limits what any graph-based verifier can check.
The STS research paper, LLM Hallucinations in Academic Research, examines the same problem from the perspective of academic practice. It asks how LLM hallucinations affect and reshape researchers’ day-to-day research activities, what consequences they create, and what strategies researchers use to detect and mitigate them. The paper uses qualitative document analysis and preliminary interview analysis. The document analysis examines technical reports, system cards, academic literature, and mitigation strategies. The interview provides a case study of a university-affiliated researcher who uses LLMs in teaching and research-related work.
The STS analysis shows that hallucination risk is task-dependent. Researchers may feel more comfortable using LLMs for structural or transformational tasks, such as reorganizing information, making glossaries, converting text into tables, or helping with code. These tasks can often be checked against the material already provided by the user. The risk becomes higher when the model is asked to retrieve facts, generate citations, summarize literature, or represent external reality. In those cases, the generated text may look academically legitimate while lacking a stable source behind it. Citation hallucination is especially important because citations are part of the infrastructure that connects academic claims to authors, journals, databases, DOIs, and readers who can verify the source.
The STS paper uses concepts such as stochastic parrots, epistemic cultures, epistemic authority, and Actor-Network Theory to explain why hallucination matters in academic research. LLMs can produce plausible academic language without grounded understanding, which creates a gap between linguistic fluency and scholarly justification. Different academic fields also have different standards for what counts as evidence, so hallucination does not have identical consequences across all disciplines. Actor-Network Theory helps show how hallucinations disrupt the network of people, sources, databases, citations, institutions, and verification practices that normally make academic knowledge credible. When an LLM fabricates a citation or a biography, it imitates the surface form of academic authority while weakening the real relation between claim and evidence.
Taken together, the technical and STS components argue that trustworthy LLM use requires both better detection systems and stronger academic practices of verification. TopoGuard shows one technical pathway: generated claims can be checked against graph structure and relation semantics in a training-free and interpretable way. The STS research shows why this technical work matters: hallucination affects how researchers trust information, how they divide verification labor between humans and tools, and how academic credibility is preserved. This thesis therefore contributes to the broader goal of making LLMs more reliable in research settings by combining algorithmic grounding with sociotechnical awareness. It emphasizes that LLMs can remain useful academic assistants, but their outputs need to be traceable with appropriate caution when evidence is weak.