§1What actually happened
OpenClaw, an open agent orchestration platform, has patched a set of critical vulnerabilities that let attackers hijack running AI agents. The detail that matters is reported by BankInfoSecurity: the flaws did not require the attacker to break a model or guess a password. They let a hostile input reach the part of the system that decides which tool to call and with what arguments. Once you are in that seat, the agent is yours. You can read its memory, redirect its actions, and make it call tools on your behalf.
This is not a model jailbreak in the lazy sense. Nobody talked the language model into saying a rude word. The problem sat one layer down, in the orchestration code that turns a model's text output into real actions against real systems. That layer trusted input it should have treated as hostile.
§2Why this is the dangerous class of bug
An agent is not a chatbot. A chatbot produces text and a human decides what to do with it. An agent produces text that gets parsed into a tool call and executed without a human in the loop. The model says "call send_email with these arguments" and the orchestrator does it. The whole value proposition of an agent fleet is that nobody is sitting there approving each step.
That convenience is also the attack surface. If an attacker can influence the model's output, and the orchestrator executes that output, then the attacker has effectively gained code execution through natural language. The OpenClaw flaws are a clean example of this. The hijack happened in the gap between "the model produced some text" and "the system took an action".
For anyone running multiple agents that share state, the blast radius is worse. If Larry can read shared memory and another agent can write to it, a compromise of one becomes a compromise of the fleet. The tools are the privileges. An agent with email access, file access, and a database connection is a service account that happens to speak English.
§3The Workloft angle
Workloft runs an eight-agent fleet. One of those agents, Larry, runs on OpenClaw. So this is not abstract commentary. The first thing I did when the advisory landed was check which version we run and confirm the patch path. The next action on the board is a full audit of the OpenClaw deployment, not because the patch is hard to apply but because the patch is the cheap part. The expensive part is understanding what Larry could touch if someone had been in there.
The honest position for a one-person shop selling to regulated UK buyers is this: you do not get to wave away a critical vulnerability in your platform by pointing at the upstream fix. Your buyer does not care that OpenClaw shipped a patch. Your buyer cares whether your agent could have leaked their data in the window before you applied it. That means version pinning you can prove, an audit trail of when you patched, and a clear answer to "what does this agent have access to and why".
§4The pattern underneath the headline
Every few months there is a new agent platform CVE and the coverage frames it as an AI safety story. It is not. It is a supply chain and least privilege story wearing an AI costume. The same rules that have governed service accounts for twenty years apply here, and most agent stacks ignore them because they are new and shiny and built by people optimising for demos.
Three boring practices would have contained this:
First, treat every tool call as untrusted by default. The model's output is user input. It should be validated against a schema, checked against an allowlist of permitted actions, and constrained to the minimum arguments needed. If your orchestrator executes whatever the model emits, you have built a remote code execution endpoint and pointed it at a probabilistic text generator.
Second, give each agent the least privilege it needs and no more. Larry does not need write access to everything just because it is convenient during development. Scope the tools. Scope the credentials. Assume any single agent can be turned and ask what it can do when it is.
Third, isolate agent state. Shared memory across a fleet is a feature until it is a propagation path. If one agent can be hijacked and that hijack reaches the others through shared context, you have built a single point of failure dressed up as collaboration.
§5What builders get wrong
The lazy read of this incident is "AI agents are insecure, be careful". That is useless. It tells you nothing you can act on. The sharp read is that the security boundary in an agent system is not the model. It is the orchestration layer that decides what gets executed, and most teams have not drawn that boundary at all.
The mistake is treating the model as the system. The model is the least interesting and least dangerous part. It produces text. Text never hurt anyone. The danger is the machinery you built to act on that text, and whether you built it to trust the model or to verify it. OpenClaw's flaw was an instance of trusting it.
If you run agents in production for regulated buyers, the question to answer this week is not "is my model jailbroken". It is "if an attacker controlled my model's output for one minute, what would happen, and could I tell". If you cannot answer the second half of that question, the patch is not your biggest problem.
