Online Archive of University of Virginia Scholarship
Towards Small Language Models for Security Query Generation in SOC Workflows17 views
Author
Reddy, Rahul, Computer Science - School of Engineering and Applied Science, University of Virginia
Advisors
Hassan, Wajih Ul, EN-Comp Science Dept, University of Virginia
Abstract
Security Operations Centers must triage massive telemetry streams, yet translating natural-language questions into correct Kusto Query Language (KQL) remains a bottleneck. This thesis investigates whether Small Language Models (SLMs) can enable accurate, low-cost NLQ-to-KQL translation at scale. We introduce a three-knob framework targeting prompting, fine-tuning, and architecture design. First, we adapt NL2KQL for SLMs with lightweight retrieval and introduce error-aware prompting that addresses common parser failures without increasing token count. Second, we apply Low Rank Adaptation (LoRA) fine-tuning with rationale distillation, augmenting each NLQ-KQL pair with a brief chain-of-thought explanation to transfer reasoning from a teacher model while keeping the SLM compact. Third, we propose a two-stage architecture that uses an SLM for candidate generation and a low-cost LLM judge for schema-aware refinement and selection. Our evaluation spans six models (three SLMs and three LLMs) on both standard and unseen datasets, using syntax, semantics, table, and filter metrics, along with latency and token cost. The two-stage approach achieves 0.971 syntax and 0.769 semantic accuracy on unseen schemas and is up to 15x cheaper in token cost than GPT-4o, demonstrating that SLMs offer a practical and scalable path for NLQ-to-KQL translation in enterprise security.
Reddy, Rahul. Towards Small Language Models for Security Query Generation in SOC Workflows. University of Virginia, Computer Science - School of Engineering and Applied Science, MS (Master of Science), 2025-12-08, https://doi.org/10.18130/67zc-ht65.