Towards an End-to-End System for Threat Detection on the UVA Network

Abraham, Brendan, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Brown, Donald, EN-Eng Sys and Environment, University of Virginia
Veeraraghavan, Malathi, EN-Elec/Computer Engr Dept, University of Virginia

Cybercrime has become one of the most pressing issues of the digital age. Cyber-attacks cost businesses on average $11 million annually and there are no signs of slowing down, as the amount of attacks has doubled in the last 5 years [1]. Moreover, attackers are employing increasingly sophisticated attack vectors, targeting government, business, and academic institutions alike. The traditional defense against cybercrime is an Intrusion Detection System (IDS) which examines network traffic for malicious or anomalous behavior. In most cases, these systems use signature-based detection, relying on pre-defined rules and attack signatures to detect intrusions. This strategy is woefully inadequate in an age where cyber-attacks are constantly evolving. A more promising approach is anomaly detection, which models host or endpoint behavior on a network in order to detect malicious traffic. In these systems, predictive models are trained to learn behavioral patterns from the data, as opposed to being told exactly what to look for.

In this thesis, we lay the groundwork for an end-to-end, anomaly-based Intrusion Detection System for UVA network traffic. Our work consists of two components. First, we demonstrate through a pilot study that machine learning techniques can be extremely effective at isolating botnet traffic and potentially detecting zero-day attacks. We present a novel evaluation cross-validation technique called Leave-One-Bot-Out CV (LOBO-CV) which effectively measures a model's ability to generalize to traffic from a new, unseen botnet. Second, we present a high-speed traffic capturing pipeline and apply it to our own network data. Finally, we present an autonomous traffic labeling pipeline that leverages blacklist, whitelist, and honeypot feeds to label daily UVA network traffic for supervised learning. Experimental results suggest that the labels produced by this pipeline are legitimate and that malicious traffic can be isolated from whitelisted traffic if the right features and model are used.


MS (Master of Science)
Cyber Security, Machine Learning, Anomaly-based Intrusion Detection Systems, High Speed Packet Capture, Network Traffic Labeling, Supervised Learning
Issued Date: