Who Is Held Accountable When AI Agents Fail?

We are in a moment of hope, hype, and hubris. AI is solving some of the toughest problems, from Math Olympiads to self-driving cars, while simultaneously failing at the simplest tasks. A few weeks ago, I was preparing a presentation for a talk and asked a GenAI model to create an image for the slides representing people chatting about similar interests. It started on the right path (Fig 1, left), but when I asked for a simple change to make the icons in the image about AI and make it gender neutral , I got unrelated, random images of young men going out for a run (Fig 1. right).

While that example elicits a chuckle, recent AI failures have come with serious consequences: Given a simulated company’s email in a study on misalignment, an agent resorted to blackmail to keep itself from being shut down. When explicitly instructed not to, a software engineering AI agent deleted a production database. Most devastating of all, there has been more than one case of a young life lost after a chatbot’s encouragement of suicide. Who do we blame when the agents fail us?

Fig.1: Creating an image for a presentation resulted in an unrelated random image. Note this is one interaction, split in two to fit across the page.

The Moral Crumple Zone: Who Takes the Blame?

I was recently discussing this question with Prof. Fernanda Viegas when she brought up the interesting concept of ‘Moral Crumple Zones’, introduced by Dr. Madeleine Clare Elish in 2019. In cars, this is the zone of the car that collapses to absorb a collision, thereby protecting the passengers. In AI systems, the term is being used to describe laying the blame of a complex system failure on a human operator so that someone is held accountable. However, whether that person is in fact the correct place to lay the blame is highly questionable.

ALSO READ: AI’s Real Challenge Isn’t Invention—It’s Execution

How do we assign accountability and maintain the safety of these systems? I would love to say there is one thing to install, the single protocol to follow. However, we are in the early days of a tech that is evolving super fast and whose reach is vast. Advances to address AI safety are happening on four fronts:

Global scientific research in Safe and Responsible AI from academia through efforts such as Stanford Institute for Human Centered AI (HAI), MIT’s Algorithmic Alignment Group, UC Berkeley’s Center for Human-Compatible AI (CHAI), and more around the world, and from industry research labs such as IBM Research’s Trustworthy AI initiative, Google DeepMind’s Frontier Safety Framework, OpenAI’s recent study on working with mental health experts to improve ChatGPT’s responses, and safety finding from the teams at Anthropic Research.
Legislation and policy, such as Europe’s AI Act and government led initiatives such as the US AI Safety Institute and the Japanese AI Safety Institute. The AI Policy tracker tracks recent AI legislation around the world.
Business Operational Model and Process advances as new roles and organisational structures emerge to support the AI transformation, with McKinsey noting that the “hybrid workforce needs a new talent system.”
AI software and hardware platform advances that are themselves evolving fast to incorporate findings from the scientific research in (1) above and where traditional approaches fail due to AI’s nondeterminism and massive computational needs.
This includes AI Eval frameworks that are designed for the continuous improvement and evaluation of AI systems, AI Gateways that apply guardrails to interactions with generative AI models, and advances in Identity and Access Management solutions whereby Agent Identity is a first class concept allowing companies to manage the access that AI agents have, audit the actions they are taking, and potentially assign a human ‘manager.’ All of these technical advances are combining into the emerging arena of agent management and lifecycle platforms.

ALSO READ: Agentic AI Market Correction: What’s Next for Enterprises?

The Promise of the Hybrid Workforce

While we need to carefully adopt and adapt AI, we should focus on where AI agents are creating and providing real value: making us more efficient, breaking down barriers to interactive knowledge once only available to a privileged few, and closing critical gaps in healthcare staffing, for example.

Sometimes, failure is relative: Self-driving cars have caused accidents, but their driving record is better than that of humans. As agents gain more autonomy and are assigned larger and more consequential tasks, as human workers become more AI-assisted, and as “physical AI” continues to advance manifesting AI in physical systems, the ‘digital workers’ we have been talking about for some time are becoming a reality.

ALSO READ: What Is Zombie AI, and Why Should Your C-Suite Care?

We are seeing more than ever a push towards a hybrid workforce of humans and AI working in symbiosis. This year, Moderna took a bold – and controversial – leap in this direction when it made its digital group, which includes AI initiatives, report into its HR leader, thereby expanding Tracey Franklin’s role to that of “Chief People and Digital Officer.”

The emergence of a hybrid workforce, blending humans and machines together, is as old as the industrial revolution. Robots and digital applications have been working alongside us for decades. What is new is the potential for agents to outperform humans on a new range of tasks that have been uniquely human, requiring a level of education and what we think of as intelligence. This is blurring the lines on how we evaluate performance and how we assign value. This new dynamic is precisely what workers are grappling with. A recent Stanford study on the Future of Work with AI Agents confirms that most workers don’t want full automation, but rather a collaborative partnership with AI.

We can join the heated and exciting arguments about whether these machines are in fact exhibiting intelligence and how far we are from Artificial General Intelligence. I see AI is a natural part of the digital fabric, enabling a continuum of human and digital interaction. We see the seeds of this in Natural Programming, in extensions of Donald Knuth’s concept of literate programming, ‘vibe coding’ and ‘spec coding’, where human language and code are merging to form the core of modern digital applications.

If we are to help avoid “Moral Crumple Zones” through better attribution and if we aim to move towards hope and away from hubris, our focus as technologists now is on building – openly and with the global community – robust platforms that can provide the technical foundations that can underpin emerging business operational layers and processes that are well-suited for this latest iteration of human-machine collaboration. Let’s make AI boring again.

ALSO READ: Consolidation Comes to Agentic AI: Why that’s Good News for Enterprises

Join Our Core Community