Kernel Testing: Benchmarking Current Techniques; Why We Need Software Testing Standards: Public and Private Sector Comparison

Author:
Jayaraman, Ravi, School of Engineering and Applied Science, University of Virginia
Advisors:
Wylie, Caitlin, EN-Engineering and Society, University of Virginia
Morrison, Briana, EN-Comp Science Dept, University of Virginia
Abstract:

Software testing is one of the most important but overlooked parts of the software development lifecycle, which can lead to many disasters such as the 2024 CrowdStrike outage. My thesis portfolio investigates the sociotechnical problem of the lack of software testing across different institutions, and how those differences can lead to major failures as well as a lack of trust. My technical topic is a proposal for benchmarking the current landscape of kernel level software testing, as well as proposing a few novel techniques that could improve it. My STS topic focuses on how testing standards in the public and private sector differ as well as how the lack of software testing regulation could lead to disasters. Both problems are related to the general problem of software testing being difficult and often neglected due to cost, necessitating the need for new techniques or regulation.
The technical problem this project proposes to investigate is how to effectively detect bugs and security vulnerabilities in kernel-level drivers, which operate at the lowest levels of the operating system with high privileges. These drivers are critical to system stability, yet their complexity, real-time constraints, and close hardware interactions make them notoriously difficult to test. This proposal outlines a comparative evaluation of three major testing approaches: static analysis, symbolic execution, and fuzzing. Tools such as Clang Static Analyzer, Infer, KLEE, and AFL will be applied to open-source Linux kernel drivers to assess each technique’s strengths and limitations. The analysis will focus on the types of vulnerabilities each method can uncover, such as logic errors, race conditions, and memory corruption, and how effectively they do so in terms of coverage, accuracy, and resource efficiency. The goal is to benchmark these tools and identify where each performs best, with the anticipated finding that a hybrid testing framework combining these methods will provide the most comprehensive coverage. The results of this evaluation will inform future approaches to kernel driver testing and contribute to improving the reliability and security of critical system software.
In my STS research project, I investigated the differences between software testing practices in the public and private sectors. Specifically, I asked how institutional structures affect the reliability of large-scale software systems, and what we can learn from comparing projects across these contexts. I analyzed case studies including legacy government software, open-source testing frameworks, and the July 2024 CrowdStrike Falcon outage. I also examined relevant government standards, academic literature on testing methodology, and industry reports. Drawing on the concept of technological momentum (Hughes, 1994), I explored how historical decisions and institutional inertia constrain how testing practices evolve over time. My key finding was that while private sector companies tend to invest more in continuous integration and automated testing, their practices are often driven by market incentives and may lack transparency or resilience in the face of rare but catastrophic failures. By contrast, public sector projects often struggle with technical debt and slow update cycles but are more likely to be held accountable through public oversight and regulation. I concluded that neither sector has a perfect model of software testing, and that improving reliability requires understanding the tradeoffs imposed by different institutional settings. Testing is not only a technical process, it is also shaped by funding models, organizational culture, and political accountability.
My work made a modest contribution to understanding and addressing the broader problem of undertested software systems. On the technical side, I was able to outline and evaluate a testing framework that balances multiple approaches: static analysis, symbolic execution, and fuzzing, to better detect vulnerabilities in kernel-level drivers. This benchmark not only highlights current limitations but also points toward the potential of hybrid testing strategies in high-risk environments. On the STS side, my analysis clarified how institutional factors shape software reliability, showing that technical solutions must be understood within their social and organizational contexts. Researchers could build on this by implementing my proposal of different testing strategies or investigate what specific regulatory incentives would be most effective in different software disciplines. There is still much to be done to make software testing more consistent, transparent, and resilient, and that work must be both technical and sociotechnical in nature.
On my STS project I would like to thank my STS professor Caitlyn Wylie for helping refine my arguments. Additionally, I would like to thank Rosanne Vrugtman for her help shaping my technical writing in my project proposal.

Degree:
BS (Bachelor of Science)
Keywords:
Technological Momentum, Fuzzing, Software Testing
Notes:

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Briana Morrison

STS Advisor: Caitlin Wylie

Technical Team Members: N/A

Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2025/05/08