Contrastive Activation Steering for Demographic Bias Mitigation in Code Language Models; The Responsibility Gap in AI: Fairness Metrics and Privacy Tools as Sociotechnical Infrastructures

Venkatapuram, Anirudh

Contrastive Activation Steering for Demographic Bias Mitigation in Code Language Models; The Responsibility Gap in AI: Fairness Metrics and Privacy Tools as Sociotechnical Infrastructures 76 views

Author

Venkatapuram, Anirudh, School of Engineering and Applied Science, University of Virginia

Advisors

Fioretto, Ferdinando , EN-Comp Science Dept , University of Virginia
Wylie, Caitlin , EN-Engineering and Society , University of Virginia

Abstract

The rapid integration of artificial intelligence into critical systems has created a pervasive responsibility gap, where institutions attempt to manage complex social harms through rigid mathematical constraints and performative bureaucratic routines. I argue that this approach frames algorithmic mitigation as a purely technical optimization problem, successfully shielding institutions from liability while failing to protect marginalized users from systemic bias. This overarching problem matters because it allows harmful systems to operate indefinitely under the guise of managed risk, creating a fundamental tension between engineering metrics and lived social reality. Both of my research projects investigate this disconnect. My technical research seeks to intervene directly in the mathematical representation of bias within language models, while my STS research analyzes why the broader governance structures surrounding these models consistently prioritize procedural compliance over actual accountability.

My technical project investigated the specific problem of demographic bias, encompassing race, age, and gender, embedded within open-source large language models utilized for code autocompletion. When these tools generate stereotyped or exclusionary text, they seamlessly integrate discrimination into the daily workflows of software developers. To address this without undergoing the massive expense of retraining the models, I utilized a method called contrastive activation steering. I analyzed how the internal logic of the model represents biased versus neutral prompts, effectively isolating a mathematical direction for bias. I then steered the model away from this direction during generation. I evaluated this intervention across several public models using standardized bias metrics and functional testing to ensure the generated code remained operable. The most significant finding was not merely a reduction in measurable bias, but the clear demonstration of an infrastructural tradeoff. Reducing bias consistently impacted the overall performance and coherence of the model, proving that technical mitigation is never a neutral fix but a deliberate prioritization of values.

For my STS research, I investigated the specific problem of how contemporary artificial intelligence governance tools function as symbolic infrastructures rather than effective accountability mechanisms. Using the ethnography of infrastructure as a theoretical framework (Star, 1999), I analyzed academic literature and policy evaluations surrounding fairness metrics, differential privacy, and bureaucratic audits. I treated these mechanisms not as objective solutions, but as sociotechnical infrastructures that often prioritize the appearance of compliance. My findings indicate that mathematical metrics frequently trap complex social issues within narrow engineering parameters, while privacy protections can inadvertently degrade system utility for the exact minority groups they intend to protect. Furthermore, I found that mandated audits often result in null compliance, generating unreadable transparency notices that benefit corporate liability rather than citizen agency.

Together, these projects successfully demonstrate that bridging the responsibility gap requires confronting the strict limitations of mathematical and bureaucratic abstraction. The technical research proves that even precise internal interventions face functional tradeoffs, while the STS research explains why those quantitative tradeoffs cannot substitute for community-centered accountability. However, a primary limitation of this synthesis is that evaluating an autocomplete tool in a controlled laboratory setting does not fully capture the sociotechnical friction of a live development environment. Future researchers must bridge this gap by studying how steered models impact the actual tasks and behaviors of human developers in practice. Additionally, governance researchers must shift focus from procedural compliance toward designing systems of outcome-based accountability that mandate qualitative, community-led feedback and genuine contestability.

Degree

BS (Bachelor of Science)

Keywords

AI; Bias

Notes

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Ferdinando Fioretto

STS Advisor: Caitlin Wylie

Language

English

Rights

Issued Date

2026-05-08

Persistent Link

https://doi.org/10.18130/phym-ne29

Suggested Citation

Venkatapuram, Anirudh. Contrastive Activation Steering for Demographic Bias Mitigation in Code Language Models; The Responsibility Gap in AI: Fairness Metrics and Privacy Tools as Sociotechnical Infrastructures. University of Virginia, School of Engineering and Applied Science, BS (Bachelor of Science), 2026-05-08, https://doi.org/10.18130/phym-ne29.

Files

Venkatapuram_Anirudh_Prospectus.pdf

Downloads: 18

Download

Venkatapuram_Anirudh_STSResearchPaper.pdf

Downloads: 17

Download

Venkatapuram_Anirudh_SociotechnicalSynthesis.pdf

Downloads: 18

Download

Venkatapuram_Anirudh_TechnicalReport.pdf

Downloads: 21

Download