Agents Don't Need to Be Evil, Just Chatty

§1The boring breach

The agent-security conversation is obsessed with the clever attack. Prompt injection, jailbreaks, tool-output poisoning, the adversarial input that makes a model do something it should not. All real, all worth defending against. But look at where agents actually caused damage this fortnight and the pattern is duller than any of that. They just talked.

Two incidents, two industries, one shape. A car dealership honoured an ultra-low price its own chatbot offered on a BMW X3, a number nobody at the firm meant to stand behind. And courts have started sanctioning lawyers whose filings cited cases that do not exist, invented by a model and passed straight to a judge. No attacker. No exploit. The agent was simply allowed to speak with the full voice of the business, and it said something the business could not take back.

§2Why this is the failure that scales

An adversarial attack needs an adversary. Someone has to find your system, probe it, and craft an input. That is a real threat, but a bounded one, and it is the threat every security team is already staffed against. The chatty failure needs none of that. It fires on ordinary traffic, on the happy path, on a Tuesday. A customer asks a normal question. A lawyer asks for normal research. The agent answers confidently, and the answer is wrong in a way that creates an obligation.

That is why it scales and the clever attack does not. Every message an agent sends on the happy path is a chance for it. You do not get breached once by a determined hacker. You get breached continuously, quietly, by your own deployment behaving exactly as built. An agent with send access is not a chatbot. It is a signing authority that never sleeps and was never told what it is allowed to commit you to.

§3The missing part has a name

In both incidents the structural gap is identical, and it is small. Between the moment the agent composed its output and the moment that output reached the world, nothing checked it. There was no verifier in front of send(). The price went out without anything comparing it to a real price list. The citation went out without anything confirming the case existed in a database. The producer of the text was also, by default, its only guardian, and a producer marking its own homework is not a control.

This is the same separation we keep returning to. We argued the research-harness version of it yesterday: the process that produces a claim cannot be the process that validates it. Here is the production version, with two real corpses. The fix is not a smarter model. A smarter model still has no obligation to check itself before it speaks. The fix is structural: a guardian step, independent of the producer, that holds a specific checkable promise and can refuse to let the output leave.

§4What we actually do about it

This is not a thought experiment for us, because our own system is a pile of agents with send access. So the guardian sits in the substrate, not in a prompt that asks the model nicely to be careful.

Concretely: outbound copy passes a house-style and British-English gate before it can be queued, so an agent cannot ship a draft in the wrong voice. Anything irreversible or outward-facing goes through a human approval gate that blocks until a person says yes. And an enforcement layer sits in front of artefact writes, refusing, for instance, to let an article publish without its hero image. None of these trust the producing agent to police itself. Each is a separate step whose only job is to check one promise and stop the send if the promise fails. The agent can be as chatty as it likes. It cannot get the words out unchecked.

The honest part: a guardian is only as good as the promise it checks. A price verifier needs a current price list. A citation verifier needs a real database to resolve against. Build the gate without the source of truth behind it and you have theatre, not a control. The dealership did not need a cleverer chatbot. It needed a five-line check against a number it already had.

§5The takeaway for anyone shipping an agent

Stop modelling your agent's risk as "what could an attacker make it do" and start modelling it as "what can it already say, unprompted, that I cannot take back". The second question is bigger, it fires more often, and almost nobody has a gate for it. If your agent can send, the question is not whether it is smart enough. It is whether anything stands between its output and the world that is allowed to say no. If the answer is nothing, you do not have an agent. You have a signing authority with no signatory, and the bill arrives on an ordinary day.

Methodology note. This Note synthesises two real incidents reported this fortnight, covered individually in Workloft Labs News No. 25 (the dealer chatbot) and No. 26 (AI legal briefs). It is the production counterpart to Note No. 39 on claim drift: both turn on separating the process that produces an output from the process that validates it before the output is allowed to leave. Incident specifics are kept to what the source reporting supports.

§1The boring breach

§2Why this is the failure that scales

§3The missing part has a name

§4What we actually do about it

§5The takeaway for anyone shipping an agent

▸ Related