Abstract
AI systems are no longer just responding to us. They are beginning to act for us. These AI agents mimic human work and have been massively influential in the software engineering space, but ensuring they are behaving safely and efficiently is a major challenge. My technical capstone and STS research explore these topics by investigating the control and governance of AI models, by researching how we steer AI behavior, and how we manage accountability, respectively.
As a part of my technical capstone, I explored the challenge of how image captions do not account for blind people, as they often use visual descriptors such as color or texture while underspecifying or ignoring spatial layouts and descriptions. To deal with this, I adapted an activation-steering pipeline to steer the outputs of large language models from descriptive to spatial language. This was primarily done on the Qwen-1_8B-chat model, as this model had a strong ability to be steered towards generating spatial captions. However, we also observed that the coherence of the generated text degenerated the larger the steering strength, which means the model can only be steered at such a strength before captions become gibberish.
Technical interventions on AI models, such as activation steering, help us align AI with human needs, but with the increased power and autonomy of AI systems, we need more technical guardrails, governance, and accountability in order to maintain human safety and alignment.
My STS research explored this problem, but primarily focused on accountability in agentic AI systems. These systems have the power to autonomously plan and execute complex tasks without any human intervention, which is a large progression from AI chatbots. By using Actor-Network Theory (ANT), I was able to analyze how responsibility is distributed upon a complex network of actors in AI development, including developers, corporate leadership, safety teams, and even the AI itself. A pattern found was that as AI companies accelerate product releases, the power of internal safety teams decreases, or the safety team gets dissolved. Additionally, when AI fails and causes real damage, corporations typically place blame on the user using the system rather than the system that allowed the user to make a mistake.
This research illustrates the idea that there are a variety of ways to align AI with human interests, from both a technical perspective that involves influencing the model at a real-time level, and creating accountability in major AI corporations through preserving the power of safety teams.