PyGDebias: A Python Library for Debiasing in Graph Learning; Development of Trust in Machine Learning in High-Stakes Scenarios

Bangarbale, Pranav, School of Engineering and Applied Science, University of Virginia
Wayland, Kent, University of Virginia
Li, Jundong, EN-Elec & Comp Engr Dept, University of Virginia

With increases in computing power, the development of artificial intelligence and machine learning technology has increased rapidly, leading to breakthroughs in large language models such as OpenAI’s ChatGPT. As we move at a breakneck pace to find cutting-edge solutions to problems in machine learning, the consequences of what we build are often not considered enough. A majority of these models are deployed in contexts where their outputs directly impact humans, which introduces problems. For one, most sophisticated models are black-box, meaning that their internal decision making processes are not transparent. This is significant in healthcare, among other fields; a medical practitioner would need to understand why a model predicted their patient could be at risk for cancer. A tangential issue is error tolerance, since it is well documented that models are poor at estimating true “confidence” in correctness. Calculated statistical confidence is often a poor estimate of a model’s true competence in the real world. Further, a model may only be as effective as its training data: if there are biases or representational shortcomings, they will inevitably be reflected in the model’s output. An example is lending, where many loan/credit default predictors struggle to reconcile accuracy with encoded biases against different ages, genders, and ethnicities. In summary, the general problem being addressed by the technical and STS reports relates to the flaws in current AI techniques that are often swept under the rug in a larger picture of technological advancement. How can the power of machine learning be responsibly harnessed with attention to human concerns?
A significant amount of work has been done in the responsible/explainable AI field to develop algorithms that decrease or remove bias in models (de-biasing). However, most of this is scattered across multiple publications, located in disparate codebases that are difficult to integrate. How can we standardize the work being done in the field so that benchmarks are readily available and de-biasing techniques can easily and efficiently be deployed on new models? My technical project, working under Dr. Jundong Li, solves this problem by creating a library - PyGDebias - written under a common framework, PyTorch. PyGDebias aggregates many common graph (dependency/network) based de-biasing methods into an easy-to-use library. It also offers comparison metrics among multiple methods, with benchmarks on common datasets so that performance can be standardized. Ultimately, PyGDebias aids researchers in the field to create common benchmarks for de-biasing techniques, just as there are standardized benchmarks for most image processing and large language models. It also allows for seamless integration and use of previously written de-biasing methods, removing the groundwork and time taken for them to be used on real-life models. PyGDebias has been released for public use, and if adopted widely, may become a staple for those researching de-biasing techniques in graph-based models.
The STS topic explores a related problem: understanding the optics of machine learning when deployed in situations where lives are at stake. Oftentimes, ML may be more accurate than humans at a given task, but a human might be more “trusted” with the lives of other humans. This leads to the question: how much trust - and indirectly, tolerance for failure - is necessary to deploy ML algorithms in life-critical situations, and how can this trust be developed for the actual users of the algorithms (i.e. first responders)? This question was explored through the use of multiple case studies, through a two-fold approach. Case studies of technologies that were unsuccessfully/successfully implemented in critical scenarios in the past were analyzed, in addition to current machine learning technology. By painting an overall picture of what aided the development of trust, I proposed a framework for successfully integrating machine learning into these critical scenarios. This framework involves three facets: legal/regulatory support, a requirement for ML not to be used as a final decision maker in high-stakes environments, and a requirement for the ML deployed to be explainable/transparent. This framework will be more informed in the future, as more case studies are done in this area.
In all, I am satisfied with my work in both the technical and STS research areas. With regards to the technical problem, the work is ongoing: as more de-biasing methods are published, they will need to be added to the PyGDebias library. Work also must be done to optimize some of the models in the library and expand the user base. With regards to the STS research, I would have hoped to have written a much more in-depth analysis of modern machine learning deployments in high-stakes scenarios, but there is not enough of a body of work yet to do so. In the future, I would recommend future researchers to modify the approach I propose for building trust based on new information that comes to light.

BS (Bachelor of Science)
Machine learning, Uncertainty estimation, Fairness algorithms

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Jundong Li

STS Advisor: Kent Wayland

Technical Team Members: N/A

All rights reserved (no additional license for public reuse)
Issued Date: