Abstract
Large language models (LLMs) frequently generate outputs that appear highly confident and articulate, yet they are prone to factual inaccuracies, flawed reasoning, and hallucinations, undermining their reliability in critical applications. As Burtsev et al. (2023) demonstrate, the probabilistic next-token prediction mechanisms underlying LLMs, combined with their limitations in genuine understanding and logical chaining, often result in persuasive but erroneous responses that users may accept at face value, creating significant risks of misapplication and overestimation of the technology’s capabilities (Burtsev et al., 2023).
Through research focused on solving these issues and improving LLMs, it has been found that increasing compute used at training time in an LLM results in a better outcome (Kaplan et al. 2020). Building upon this, it was concluded by Welleck et al. (2024) that scaling compute at inference-time through the use of sophisticated algorithms further results in LLMs producing better results. Current solutions to the problem of reliability amidst reasoning challenges build upon these prior conclusions and include improvements to chain-of-thought (CoT) prompting, and the newer chain-of-continuous-thought (Coconut) prompting. These approaches are increasingly effective at resolving the reliability issue, however, they still require additional training in order to achieve superior coverage-accuracy tradeoffs. In order to achieve these improved tradeoffs without increasing the amount of training required, NoisyCoconut was developed. NoisyCoconut leverages existing chain-of-continuous-thought models and additionally perturbs the hidden states in order to create several reasoning pathways that each explore a different region of the solution space. When these pathways agree with each other, it provides increased evidence that their answer is correct and improves the coverage-accuracy tradeoff (Jerge & Evans, 2026).
NoisyCoconut currently generates several independent branches of text responses. After shared latent reasoning passes, the model splits into a number of branches and each branch samples a complete answer using temperature and top-p sampling. A final answer is extracted from each branch's text output via pattern matching, and the most common answer across all branches is selected by majority vote. My contribution to Coconut is the implementation of an alternate method of aggregation. My aggregation replaces the majority voting with a probability mass approach that weights each branch by the model’s cumulative token-level certainty. Instead of treating each reasoning path as an equal vote, my method extracts the log-probabilities of every token generated within a branch and calculates its total exponential mass. This mass is then normalized against the total potential mass of all tokens across all branches. The final selection is then done by selecting the valid response with the highest accumulated mass.
Although completely unrelated to large-learning models, another important aspect in the world of Computer Science is the current disparity among what should be allowed through patent law in the video game industry. Many large corporations attempt to use patent law in order to secure minor monopolies for themselves and protect themselves from competition, however, these practices are often loosely founded in legality and negatively affect the growth of the industry.
I used actor-network theory (ANT) and historical case comparison in order to conduct my research into patent law in the video game industry. Actor-network theory was used to map the interactions of the corporate, independent, and legal actants in the industry and historical case comparison was employed by viewing historical legal cases on video game patents and comparing them to one another and more recent cases. Through my research I concluded that U.S. patent doctrines impact innovation stagnation, market consolidation, and consumer exploitation within the video game industry. The translation of everyday player behaviors into legally excludable “inventions,” the black-boxing of once-common game mechanics and genres, and the construction of defensive patent portfolios that enables cross-licensing among large firms all raise overwhelming barriers for independent developers and must be resolved in order to stop their negative influence on the industry.