Errors in Variables in Random Forests: Theory and Application to Eyewitness Identification Data

Author: ORCID icon orcid.org/0000-0002-1800-8524
Liu, Alice, Statistics - Graduate School of Arts and Sciences, University of Virginia
Advisor:
Kafadar, Karen, AS-Statistics, University of Virginia
Abstract:

Eyewitness identifications play a critical role in the investigation of crimes and the subsequent legal proceedings. However, law enforcement do not have the time and resources available to conduct the much-needed research for the development and validation of more reliable practices. Research in the effectiveness of law enforcement practices for eyewitness identification procedures remains incomplete. It is well known that eyewitnesses make errors, which often result in grievous consequences. Currently, there are a few options for eyewitness identification analysis, including receiver operating characteristic (ROC) curve analysis, Bayesian prior- posterior plots, and decision utility. All of these methods lack a fundamental way to include variability and the complex and interactive relationships of the variables affecting eyewitness identification accuracy.

We will also discuss new methods for eyewitness identification (EWID) data, which are borrowed from fields such as diagnostic medicine. The tools and procedures for analyzing the data in meaningful and utilitarian ways from these fields can provide thoughtful and valid conclusions. Such methodologies require ease of use and interpretation, flexibility, and efficient implementation. This compilation of chapters shows the thought process involved in considering what kinds of methods and approaches to thinking could help lead to better EWID procedures, with the intention of resulting in fewer errors, both in false convictions and false acquittals.

This research began with an interdisciplinary problem of understanding EWID data and existing statistical methodologies for the analysis of such data, as well as the consequences of an incomplete comprehension of the data. It was clear that there are latent variables to be estimated that are imperative to understand parts of the data, which resulted in the development of the proposed framework. This framework allows researchers to estimate an individual’s probability of accuracy, which is dependent on their individual probability of choosing a face from a lineup and the global probability of target presence in the lineup (i.e., base rate). The true value in the proposed method is how easily it is applied and interpreted, which could be helpful for law enforcement agents, lawyers, and jurors.

A component of the estimation relies on the algorithm of random forests. Since EWID data is susceptible to measurement error due to the human component, we discovered that the impact of measurement error on random forest models needs further study. This thesis addresses that problem. The literature provides a frame- work for the asymptotic behavior of random forests. This provides the groundwork to derive an estimator for the mean difference of two random forest models. In our case, the random forest models are developed with and without measurement error to simulate the behaviors of the differences. In the simulations, it was clear that there is an effect from measurement error. Since measurement error is usually assumed to be nonexistent or negligible, this is a valuable finding. The next steps should be to develop a methodology similar to those already in place for classical statistical models to account for these errors.

Degree:
PHD (Doctor of Philosophy)
Keywords:
statistics, eyewitness identification, random forests, confidence and accuracy, forensic evidence, ROC analysis, sensitivity, specificity, predictive value, choosing, classification, class probability estimation, measurement error, errors-in-variables
Sponsoring Agency:
Arnold Ventures
Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2020/04/28