FairByDesign · Public Brief
The Accountability Series · Paper I
Why AI governance is becoming a control problem.
Most organizations are still asking whether their AI systems are accurate. That question matters, but it is no longer the hard one. The harder question is whether anyone remains meaningfully accountable when AI systems shape decisions, rank options, recommend actions, or act through connected tools.
The accountability problem begins before the final decision. It begins the moment judgment is delegated, influence becomes invisible, and the human who is formally responsible no longer has the visibility, the authority, or the time to intervene. By the time an outcome lands on a real person, the question “who is answerable for this?” can be genuinely hard to answer — not because anyone acted in bad faith, but because the chain connecting human responsibility to system-shaped outcome has quietly stretched and frayed.
The central challenge of AI governance is not intelligence, accuracy, or explainability alone. It is whether human accountability can survive when decision influence is delegated to systems that are only partially understood.
This piece makes the case for why that is the right problem to focus on, and why the path through it looks less like perfecting our machines and more like the way we already govern other systems we do not fully understand.
From tools to decision systems
To see why the problem is new, recall what changed. Across the long arc of technology, three broad transformations stand out.
Tools amplified labor. The lever, the plow, the engine extended what human muscle and coordination could accomplish, but the human remained unmistakably the agent. A hammer does not decide where to strike.
Traditional software amplified calculation. It took over arithmetic, bookkeeping, and rule-following at scales no human could match, through rules that could, at least in principle, be inspected, tested, and traced. Not all conventional software is simple or transparent — plenty of it is sprawling and opaque — but its outputs follow from identifiable code, inputs, and configuration. When something goes wrong, the error is, in principle, locatable.
Artificial intelligence amplifies judgment. This is the discontinuity. Modern AI systems increasingly perform tasks once regarded as irreducibly human: weighing competing considerations, recommending courses of action, prioritizing what deserves attention, predicting what is likely to happen, and — in agentic systems — taking actions in the world on our behalf. The U.S. National Institute of Standards and Technology defines an AI system precisely by this capacity: an engineered system that, for given objectives, generates outputs such as predictions, recommendations, or decisions that influence real or virtual environments. The decision influence may be learned, probabilistic, context-sensitive, and embedded in a larger workflow — which is exactly what weakens the old comfort of reconstruction.
We can name the resulting phenomenon delegated decision influence. When a system recommends which applicants to interview, which transactions to flag, which patients to escalate, or which content to surface, it is not merely executing a human decision. It is shaping the space of decisions humans then make — often invisibly, often at a scale and speed that forecloses real review of any individual case. The person who “makes the final call” is frequently choosing among options already filtered, ranked, and framed by the system. The influence was delegated long before the decision was formally taken.
Delegated decision influence is what distinguishes a decision system from a tool. A tool waits to be used. A decision system participates. And once a system participates in decisions, the chain of accountability we have always taken for granted begins to stretch.
The accountability gap
Consider the simplest model of accountability:
Human → Decision → Outcome
A person decides; an outcome follows; that person is answerable. The chain is short, and responsibility has nowhere to hide. Now introduce a decision system:
Human → AI System → Decision → Outcome
And in the agentic case, where the system not only advises but acts:
Human → AI Agent → Tool → Action → Outcome
Each new link is a place where accountability can leak away. This is the accountability gap: the widening distance between the human who is nominally responsible and the outcome they are supposed to answer for. It is the operational descendant of what the philosopher Andreas Matthias called the responsibility gap — the space that opens when no one can fully predict or fully control a system’s behavior, yet someone must still answer for it.
It is worth being precise about what kind of thing accountability is. It is not purely a technical property; it is a social and institutional relationship of answerability. We can build systems that are technically traceable — every event logged, every input recorded — and still find that no one is accountable, because recording what happened is not the same as establishing who must answer for it. The accountability gap is therefore not merely a gap in our logs. It is a gap in our ability to connect outcomes back to responsible human agents in a way that institutions, courts, regulators, and the public will accept.
The Five-Question Accountability Test
A governance system is only as real as its ability to answer five questions after an AI-influenced outcome. If the answers are crisp, accountability is intact. If they dissolve into “the system decided,” the gap has opened.
- Who approved this? When a recommendation is generated, accepted by default, acted upon automatically, and only later examined, the notion of “approval” becomes ambiguous. Approval may have been a single configuration choice made months ago by someone who never imagined this particular case.
- Why did this happen? The reasons may be distributed across model weights, training data, prompt context, tool outputs, and runtime conditions in ways that resist any clean narrative. “The model determined” is not an explanation; it is the absence of one.
- What information influenced the outcome? With systems that draw on vast context and call external tools, the inputs that actually shaped a result may be unlogged, transient, or simply unknown.
- Could someone have intervened? Nominal authority to intervene is not the same as practical ability. Researchers have long documented the out-of-the-loop performance problem: operators who supervise rather than actively control an automated system lose situation awareness and become slower and less able to take over when it fails — precisely when intervention matters most. A human nominally overseeing a system that acts in milliseconds, across thousands of cases, is structurally positioned to be a bystander.
- Who is accountable? When the questions above have no crisp answers, this one does not either. Responsibility diffuses across the engineer who built the system, the vendor who supplied it, the executive who deployed it, and the operator who was watching the screen — and a responsibility shared by everyone is often borne by no one.
There is a trap hidden in the fourth question, identified long before modern AI. In her classic analysis of the ironies of automation, Lisanne Bainbridge observed that automating most of a task does not reduce the demands on the human operator so much as transform and intensify them: the operator must stay ready for rare, high-stakes interventions in a system they no longer routinely exercise — exactly the situation in which skill and awareness decay. Regulators have begun to encode this risk directly. The EU’s Artificial Intelligence Act requires that human overseers of high-risk systems remain aware of the tendency toward automation bias — the inclination to over-rely on a system’s output — and be enabled to interpret that output and to decide not to act on it.
Why explainability is not enough
A natural response to the accountability gap is to demand explainability: if the system can explain why it did what it did, surely accountability follows. Explainability is genuinely valuable, and nothing here argues against pursuing it. But four related ideas are often run together, and it helps to separate them.
Explainability offers a reason for an output. Interpretability concerns whether the system’s operation can be directly understood. Traceability records what happened. Accountability connects the outcome back to a responsible human or institution. A system can be traceable without being accountable, explainable without being governable, and accurate without preserving human responsibility.
With those distinctions in hand, several properties of advanced AI make complete understanding an uncertain foundation for governance.
Emergent and debated capabilities. Researchers have reported abilities that appear only above a threshold of model scale and cannot be predicted by extrapolating from smaller models. Later work has questioned whether some of these effects reflect genuine emergence or are artifacts of the metrics chosen to measure them. Either way, the governance point holds: system behavior cannot always be anticipated from smaller-scale tests or from prior deployments.
High-dimensional reasoning. The internal representations of these systems live in spaces of enormous dimensionality. An “explanation” a human can read is necessarily a compression — a low-dimensional shadow of a high-dimensional process. The computer scientist Cynthia Rudin draws the crucial line: explaining a black box after the fact is not the same as the model being interpretable, and a post-hoc explanation can be a plausible story that does not faithfully match the computation that actually occurred. Telling a faithful explanation from a convincing rationalization is itself an open problem.
Shifting behavior and shifting context. Behavior well understood in one setting may not transfer to another. And systems are moving targets: many deployed models do not learn continuously, but they are updated, fine-tuned, re-prompted, connected to retrieval sources, or embedded in changing workflows. Each of those changes can move the system out from under yesterday’s explanation.
Interaction effects. In real deployments, AI systems do not act alone. They call tools, consume other systems’ outputs, and operate within human workflows. The behavior that matters for accountability is the behavior of the whole sociotechnical system, which can exhibit dynamics none of its parts display in isolation.
None of this makes explainability futile. It makes it partial, provisional, and incomplete — and it means a governance strategy that depends entirely on achieving complete understanding may be waiting for a condition that never arrives. The prudent stance is not “we cannot understand these systems, so understanding is pointless.” It is: pursue understanding vigorously, favor inherently interpretable designs where the stakes are high, and build institutions so that accountability does not collapse when understanding falls short.
Bounded control instead of complete understanding
Here is the reassuring fact at the center of this argument: we have done this before. Modern society routinely governs systems no individual fully understands — not by achieving complete comprehension, but by building structures of control around irreducible uncertainty.
Financial markets. No one fully understands the global financial system; its behavior emerges from billions of interacting decisions and regularly surprises experts. Yet we govern it. Market-wide circuit breakers halt trading across U.S. exchanges when the S&P 500 falls 7, 13, or 20 percent in a day, pausing the system precisely when it is least understood. Alongside them sit position limits, mandatory disclosure, independent audit, and regulators with authority to investigate and intervene.
Aviation. Modern aviation is a vast sociotechnical system too complex for anyone to hold entirely in their head, yet its safety record has been built through relentless observation, reporting, investigation, redundancy, and correction. An independent body, the National Transportation Safety Board, investigates major accidents within its jurisdiction, determines probable cause, and issues safety recommendations that feed back into the system, without itself being the regulator it reports on.
Nuclear operations. We run systems of catastrophic potential through layered defenses — containment, independent safety review, hard operating limits, clear lines of human authority. Charles Perrow’s study of Three Mile Island, Normal Accidents, offers a caution worth keeping: in systems that are both highly complex and tightly coupled, failures interact in ways no designer anticipated, and adding safeguards can itself add complexity that breeds new failure modes. The lesson is not that control is hopeless, but that it must be designed with humility about what it can foresee.
Human organizations. Most fundamentally, we have always delegated decisions to agents whose inner workings we cannot inspect: other people. We do not understand what happens inside another mind, yet we hold people accountable every day — through defined roles and authorities, oversight and review, audit and investigation, and consequences attached to outcomes.
Different as they are, these share a common architecture of control: monitoring (continuous observation), boundaries (defined limits on permitted behavior), independent review (assessment by parties who did not build the system), incident investigation (disciplined reconstruction of failures), human accountability (identifiable people who answer), and intervention authority (the recognized power, and practical ability, to stop or alter behavior). From which the reframe at the heart of this series:
Bounded control instead of complete understanding.
We have never governed our most important complex systems by understanding them perfectly. We have governed them by surrounding them with control — observing, constraining, reviewing, investigating, and retaining the authority to intervene. Encouragingly, the first generation of AI-specific governance instruments increasingly reflects this logic. The NIST AI Risk Management Framework organizes its guidance around four functions — Govern, Map, Measure, and Manage — that are recognizably the language of bounded control rather than complete comprehension.
But the precedent carries a warning. Bainbridge’s irony applies at the institutional scale just as at the operator’s console: the more we automate, the more capable and better-supported the humans who retain ultimate authority must be — not less. Bounded control is not a license to disengage. It is a discipline that demands sustained human capacity.
What this means in practice
If bounded control is the strategy, no single discipline can implement it alone. This is why the NIST framework places a Govern function across all the others: governance is not a department but a property of how the disciplines work together. Increasingly it is also a management-system problem in its own right, with dedicated standards such as ISO/IEC 42001 treating AI governance as something an organization establishes, maintains, and continually improves — not a one-time compliance event.
The common failure is to assign AI accountability to one team and call the problem solved. It does not work, because each function holds only one lever:
- Engineering can build controls but cannot define legal responsibility.
- Security can constrain what an autonomous agent is permitted to do, but cannot decide how much exposure the organization should accept.
- Risk can model exposure but cannot instrument runtime behavior.
- Legal can interpret obligations but cannot enforce technical boundaries.
- Audit can verify evidence but cannot create authority where leadership has not assigned it.
- Executives hold the authority in which accountability must ultimately rest — and here the law is moving. For high-risk systems, the EU AI Act is shifting governance toward named, competent, empowered human oversight rather than vague organizational responsibility: Article 14 requires systems be designed for effective human oversight, and Article 26 requires deployers to assign that oversight to natural persons with the necessary competence, training, authority, and support.
AI accountability exists only when these functions operate as one control system. An organization that treats it as the responsibility of a single team has not closed the accountability gap; it has merely hidden it.
The open question — and what comes next
The cross-functional argument scales up. Just as no single discipline can govern AI within an organization, no single organization can govern AI within society. The capacity to observe, constrain, audit, and intervene is distributed across governments, regulators, industry bodies, researchers, companies, and the citizens decisions are made about. The public-interest dimension matters because accountability is ultimately owed not only to boards and regulators, but to the people whose rights, opportunities, and life chances are shaped by AI-influenced decisions. The first instruments already span the spectrum from voluntary to binding — the NIST framework offers a flexible baseline, newer profiles extend it to generative systems, and the EU AI Act imposes enforceable obligations. Together they mark the beginning of an institutional response, not its completion.
Which raises the question this series exists to sit with:
Can institutions evolve quickly enough to preserve accountability as systems become more capable?
AI capability and adoption are advancing on compressed timelines while the institutions meant to measure, govern, and absorb these systems struggle to keep pace. One measure of that pace: the length of software tasks frontier AI can complete autonomously has been doubling roughly every seven months over recent years, though the researchers behind that finding are careful about how far it generalizes. Institutions, by contrast, have historically evolved over years or decades, and the governance regimes for finance, aviation, and nuclear power matured substantially in the aftermath of failures. The accountability gap is, in part, a race between those two clocks. This piece does not predict the outcome. It insists the race is real, and that treating the challenge as merely technical is a way of losing it slowly.
The most important question, then, may not be the one we instinctively ask — “Can we make AI perfectly understandable?” — because that makes accountability hostage to a condition that may never arrive and that, for our other complex systems, we have never required. The better question is the one bounded control is built to answer: how do we preserve accountability when understanding remains incomplete?
This is Paper I of the Accountability Series. It establishes the accountability gap. Paper II introduces the operational response: the Accountability Control Plane — a concrete architecture for surrounding autonomous systems with observation, boundaries, review, intervention authority, and accountable human control. Paper I earns the question. Paper II answers it.
A note on sources
The central reframe in this piece — bounded control instead of complete understanding — is original to FairByDesign. The supporting sources include Andreas Matthias on the responsibility gap, Mark Bovens on accountability as institutional answerability, Lisanne Bainbridge and Endsley and Kiris on automation oversight, Cynthia Rudin on interpretability, the NIST AI RMF and ISO/IEC 42001 on AI governance systems, the EU AI Act on human oversight, and METR / Kwa et al. on the pace of AI task capability.
Mercury Research members receive the full Field Paper PDF, complete Evidence Notes, APA references, the Five-Question Accountability Worksheet, and the control translation table for applying the test inside organizations.
Subscribe Free To The Mercury Brief
If this framing helps you see the AI governance problem more clearly, subscribe to The Mercury Brief. Your support helps FairByDesign produce public-interest research that turns AI governance from principle into accountable institutional control.

Leave a Reply
You must be logged in to post a comment.