Abstract
The effective remediation of software vulnerabilities remains a critical challenge in modern software development. While Common Vulnerabilities and Exposures (CVE) records provide detailed metadata on vulnerability type, severity, and references, less is known about how developers actually adopt and implement patches in practice. This study analyzes CVE records from 2019–2024 that include GitHub commit links, curating a dataset of repositories with multiple CVEs or CVEs linked to multiple diffs. Building on this dataset, we provide five main contributions. First, we quantify the prevalence of GitHub-linked CVEs, identifying how many contain multiple commits and how often repositories address multiple vulnerabilities. Second, we characterize the structural properties of patches—including patch size, remediation time, and variability—across severity levels and vulnerability categories. Third, we evaluate the utility of traditional similarity measures (string matching, abstract syntax trees, and embeddings) for determining patch relatedness, highlighting the limitations these methods face in capturing semantic and contextual relationships. Fourth, we assess the effectiveness of large language models (LLMs) for evaluating both patch relatedness and sufficiency, showing how they outperform traditional techniques. Finally, we present an analysis of vulnerability categories and base scores across the dataset, with a focus on insufficient patches.