Decision Dynamics in College Football Recruitment: An Analytical Approach to Predicting Commitment Patterns

Lansing, Maryanna, Systems Engineering - School of Engineering and Applied Science, University of Virginia
Burkett, Matthew, School of Engineering and Applied Science, University of Virginia
Riggs, Robert, School of Engineering and Applied Science, University of Virginia
Bolton, Matthew, School of Engineering and Applied Science, University of Virginia
This research investigates the multifaceted decision-making processes behind high school football recruits’ college commitment choices by integrating statistical hypothesis testing with advanced machine learning methods. Grounded in the growing importance of sports analytics in higher education and professional athletics, this study specifically addresses the gap in recruitment analytics for college football—a sport characterized by unique positional requirements, variable conference strengths, and diverse facility attributes. The work examines how intrinsic factors, such as a recruit’s playing position and player tier, combined with extrinsic factors - like geographical distance from home, stadium capacity, and playing surface quality - collectively influence a recruit’s commitment decision.
The study begins by establishing the context of sports analytics as a transformative field within the sports industry, where data-driven insights have revolutionized team management, performance analysis, and strategic planning. Although previous research has largely focused on in-game performance and financial aspects of professional sports, less attention has been given to the collegiate recruitment arena. This thesis aims to fill that void by developing a comprehensive model that predicts commitment outcomes across various demographic and contextual subgroups. Specifically, the research questions center on determining whether and how the distance from a recruit’s home to a college, the capacity and quality of the college’s stadium, the player’s position, and their assigned tier interact to shape the final decision of where to commit.
To address these questions, a robust dataset was compiled comprising 132,522 data points from 10,734 unique recruits over a seven-year period (2017–2023). Data were aggregated from diverse sources, including recruiting databases, US Census statistics, collegiate performance ratings, and facility records, resulting in 71 distinct features. The methodological approach involved the Two-Sample T test 1to identify significant differences in key variables such as distance from home and stadium capacity between committed and non-committed groups.
Further analysis was conducted using Analysis of Variance (ANOVA) to examine group differences across categorical variables including playing position, conference affiliation, and player tier. The results of these tests revealed that recruits who ultimately commit to a college tend to come from homes that are, on average, closer to the institution compared to those who do not commit, with significant variations noted when data are stratified by position and player tier. Similarly, the quality of the college stadium—evaluated by its seating capacity and playing surface—emerged as a critical factor, particularly among higher-tier recruits. Interaction models were subsequently developed to explore the combined influence of these factors, revealing that the interplay between conference prestige, player tier, and positional demands contributes to nuanced patterns in commitment behavior.
On the modeling front, the thesis implements a series of machine learning approaches designed to enhance the predictive accuracy of recruitment outcomes. Multiple models, including Logistic Regression, Random Forests (both standard and tuned versions), Gradient Boosting, and a Two-Stage Decision Tree coupled with Random Forest, were evaluated. These models were trained on a pre-processed and standardized dataset with encoded categorical variables and imputed missing values to ensure data integrity. Cross-validation techniques, specifically using Stratified K-Fold, were employed to assess model robustness. The primary performance metric used was accuracy, supplemented by log loss for models that output probability estimates. Among the various techniques, Logistic Regression consistently demonstrated high accuracy across most player positions, although specialized sub-models for certain positions, such as a Two-Stage model for positions with sparse data, were necessary to address imbalances.
The findings of this study underscore that both geographical and facility-related factors are significant predictors of recruitment decisions. Recruits from greater distances exhibit lower probabilities of commitment, a trend that is further moderated by their positional roles and perceived player tiers. In addition, the quality of a college’s stadium, particularly its capacity and the type of playing surface, significantly influences a recruit’s choice, with higher-tier athletes showing a pronounced preference for institutions with state-of-the-art facilities. These insights suggest that recruitment strategies can be optimized by tailoring outreach efforts and facility investments to the specific needs of different recruit segments. Moreover, the interaction effects observed between conference strength and player tier indicate that larger athletic programs may benefit from emphasizing both infrastructural advantages and strategic recruiting practices to attract top talent.
In conclusion, this thesis provides a novel contribution to the field of sports analytics and systems engineering by developing a predictive framework that elucidates the complex interdependencies influencing high school recruits’ college commitment decisions. The integration of rigorous statistical testing with advanced machine learning techniques not only validates the significance of individual factors such as distance and stadium quality but also highlights the importance of their interactions with positional and tier-based distinctions. These findings have direct implications for collegiate athletic departments seeking to refine their recruitment strategies and for researchers aiming to further explore data-driven approaches in sports decision-making. Future research will extend this model by incorporating additional dimensions such as academic performance and long-term career outcomes, thereby offering a more holistic view of the factors that drive recruitment in college football.
MS (Master of Science)
Predictive Analysis, Machine Learning, College Football, Recruiting
English
2025/04/23