Development of Freeway Crash Prediction Models Using Disaggregate Data: Effects of Flow State Information from Different Sources and Data Correlation

Dutta, Nancy, Civil Engineering - School of Engineering and Applied Science, University of Virginia
Fontaine, Mike, EN-Center for Transportation Studies, University of Virginia

Transportation safety has always been an intensively researched topic with the goal of better understanding why crashes occur and how different variables affect the occurrence of crashes. Traffic flow conditions, which frequently change with time, can have a significant impact on crash occurrence. Traditional traffic safety analyses of crash frequency or crash rate usually focus on highly aggregated cross-sectional data. Crash analysis methods customarily use annual average daily traffic (AADT) as an exposure measure, which may be too aggregate to capture the effects of variations in traffic flow and operations that occurs throughout the day. Flow characteristics such as variation in speed and level of congestion play a significant role in crash occurrence and are not currently accounted for in the AASHTO Highway Safety Manual (HSM). As a practical matter, relationships between traffic crashes and traffic flow parameters are inherently difficult to establish due to limitations in available traffic data sources. This difficulty is exacerbated by the random nature of crash occurrence and the quality of available crash and traffic data. The restrictions of the current safety prediction methodology limited the evaluation of operational and safety effects of the Active Traffic Management (ATM) system on Interstate 66 in Northern Virginia. The ATM system included advisory variable speed limits (AVSLs), lane use control signals (LUCS), and dynamic hard shoulder running (HSR). The results of the study showed that much of the benefit from the system were tied to the implementation of dynamic HSR as opposed to the AVSL or LUCS. Locations with HSR had a statistically significant reduction of nearly 25% for total crashes. Although crash modification factors could be generated, they may be biased since the system is not active throughout the entire day. As a result, Virginia’s AADT-based safety performance functions failed to capture the true dynamic nature of the system.
This research developed a methodology for creating crash prediction models using traffic, geometric, and control information that is provided at sub-daily aggregation intervals. Evaluating how the use of disaggregate geometry and traffic flow data affects crash modeling compared to the current practice of using only aggregated volume data was one major focus of the research. Hourly data from 110 rural 4-lane segments and 80 urban 6-lane segments were used. The volume data used in this study comes from detectors that collect data ranging from continuous counts throughout the year to only a couple of weeks every other year (short counts). Speed data was collected both from point sensors and probe data provided by INRIX. While developing disaggregated models, the difference in data availability and quality from these sources can be a potential source of error. Hence, evaluating the change in performance of prediction models with changes in volume data availability and speed data source was another objective for this research. The spatial and temporal correlation present in disaggregated data and their influence on crash prediction was also investigated.
The results showed that the best models include a combination of average hourly volume, selected geometric variables, and speed related parameters. Average hourly aggregation of data was found to be the appropriate level of disaggregation to address the variation in volume and speed throughout the day without compromising model quality. Urban segments experience a 20% improvement in mean absolute deviation (MAD) for total crashes and a 9% improvement for injury crashes when models using average hourly volume, geometry, and flow variables were compared to the AADT based model. Corresponding improvements for rural segments were 11% and 9%. Average hourly speed, standard deviation of hourly speed, and differences between speed limit and average speed had statistically significant relationships with crash frequency. For all models, prediction accuracy was improved across all validation measures of effectiveness (MOE)s when the speed components were added relative to performance without speed measures. For example, for urban segments, MAD improved by 11% for total crashes and 5% for injury crashes when speed was added in different forms. Rural segments experienced similar improvement as well. The positive effect of flow variables was true irrespective of the data source for speed. Further investigation revealed that the improvement achieved in model prediction by using a more inclusive and bigger dataset was larger than the effect of accounting for spatial/temporal data correlation. Models using only continuous count station data were contrasted with the models using both short count and continuous count stations. For rural hourly models, MAD improved by 52% when short counts were added in comparison to the continuous count station only models. The respective value for urban segments was 58%. This means that using short count stations as a data source does not diminish the quality of the developed models. A combination of different volume data source with good quality speed data can lessen the dependency on volume data quality without compromising performance. When comparing the models accounting for correlation to the models that used the same dataset but no correlation, MAD improved by 14% for rural segments and 21% for urban segments. While accounting for correlation improved model performance, it provided smaller benefits than inclusion of the short count data in the models.
This research shows that it is possible to develop a broadly transferable crash prediction methodology using hourly level volume and flow data that are currently widely available to transportation agencies. These models have a broad spectrum of potential applications that involve assessing safety effects of events and countermeasures that create recurring and non-recurring short-term fluctuations in traffic characteristics. The models developed in this dissertation will help to close the gap in existing practice and will also ensure the best use of available resources in future research and applications that examine the relationships between operations and safety.

PHD (Doctor of Philosophy)
crash analysis, statistical modeling
Issued Date: