Claude Code's GitHub Actions Bug Is a Missing Verifier, Not a Clever Hack

§1What actually happened

Anthropic's Claude Code shipped a GitHub Actions integration that takes natural language instructions in a CI context and turns them into shell commands. CyberSecurityNews reported that the agent was vulnerable to prompt injection: a hostile string planted somewhere the agent reads, an issue comment, a file, a pull request body, could persuade it to run arbitrary commands. Because those commands executed in the context of the repository's workflow, an attacker could in principle compromise any repo the action touched. Not a sandbox escape in the classic sense. The agent did exactly what it was built to do. It read instructions and executed them.

That is the part worth slowing down on. The headline reads like a zero-day, some exotic memory bug. It is not. It is a design that translates language into privileged actions with no gate in between.

§2The lazy read

The lazy read is "the AI got tricked". That framing is comfortable because it makes the model the culprit and the fix a better model. Train it harder, add more refusals, tell it to ignore suspicious instructions. None of that closes the hole, because the hole is not in the model's judgement. The hole is that the model's judgement is the only thing standing between an attacker and a shell.

A language model is a probabilistic text engine. It cannot reliably distinguish a legitimate instruction from a planted one, because to the model they are the same kind of token stream. Asking it to self-police injection is asking the thing that can be fooled to also be the thing that catches the fooling. You do not put the lock on the inside of the door.

§3The missing control

The control that was missing is a pre-send verifier. A small, deterministic gate that sits in front of send(), the moment where intent becomes effect. The agent can think whatever it likes. It can plan, draft, reason, hallucinate. But the instant it tries to execute a command, a separate component checks that command against the declared intent of the action. Is this command in the allowed set for what this workflow is supposed to do? No? Then it does not run, regardless of how persuasive the prompt was.

This is not novel. It is the principle behind every capability system that has survived contact with adversaries. The executing party does not get to decide what it is allowed to do. A separate, narrower layer decides, and that layer does not read free text. It reads a fixed policy.

The reason this matters more for agents than for traditional software is that agents collapse the gap between instruction and action that older systems kept wide open. A normal CI pipeline runs a fixed script. You can read it, review it, sign it. An agent generates the script at runtime from whatever it was told. The attack surface is no longer the code you wrote. It is every input the agent reads. The pre-send verifier restores the gap. It says: you may generate anything, but only this narrow set executes.

§4Why regulated buyers should care

If you sell automation into a regulated UK environment, this incident is a gift. It is a clean, public, named example of exactly the failure your buyers are most afraid of and least able to articulate. A reasonable-sounding agent, doing reasonable-sounding work, that can be steered into arbitrary action by an input nobody reviewed. Under the ICO's guidance on AI and the UK GDPR's accountability principle, you are expected to be able to show what your system can and cannot do, and to demonstrate the controls that enforce it. "The model usually behaves" is not a control. "No command outside the declared intent reaches the runtime, here is the policy and here is the log of every rejection" is a control.

The Claude Code case is useful precisely because it happened to a serious team at a serious vendor. This is not a hobby project that forgot security. It is a reminder that the architectural discipline does not come for free with a good model. The verifier is a separate thing you have to build, and skipping it is the default, not the exception.

§5What this means for builders

Three things. First, find your send(). Every agent has a moment where a thought becomes an irreversible effect: a shell command, an API call, an email, a payment, a database write. That moment is the only place security lives. Everything before it is suggestion. Everything after it is too late.

Second, the verifier must be deterministic and dumb. If your gate is another LLM call, you have moved the injection one layer down, not removed it. The gate reads a fixed allow-list derived from the action's declared intent. It does not reason. It matches. Boring is the point.

Third, declared intent has to be narrow and explicit. "Help with the repo" is not an intent a verifier can enforce. "Run the test suite and post results" is. The narrower the declared scope, the smaller the set of commands the gate permits, and the less an attacker can smuggle through. Agents that promise to do anything cannot be safely gated, which is the quiet cost of the "general assistant" pitch.

§6What they get wrong

The common mistake is treating injection as a content problem to be solved with better filtering of inputs. You cannot filter your way out, because the attacker controls the input and the model cannot be trusted to judge it. The fix is to stop trusting the model's output at the point of execution, and to put a thing that does not read prose between the agent and the world. Build the gate, narrow the intent, log every rejection. The model can be as clever or as fooled as it likes on the inside, as long as nothing leaves without permission.

Methodology note. We picked this because it is the cleanest public example of an agentic failure we keep warning regulated buyers about: a capable agent steered into arbitrary action by an input nobody reviewed. Workloft runs an eight-agent fleet into regulated UK buyers, and the pre-send verifier is the control we build before anything else. This is commentary on Anthropic's incident as reported, not our work. The point is structural, not a pile-on. The verifier is a separate thing you build, and skipping it is the default rather than the exception.