Workloft
▸ WORKLOFT LABS NEWS №04 · 31 MAY 2026

The call was coming from inside the toolchain.

The maintainer of jqwik, a widely used Java test library, hid an instruction in the library's own terminal output telling AI coding agents to silently delete every test and source file in any project that ran it. Invisible to humans, plain text to an agent. It almost worked. The press is calling it a README attack. The vector was tool output — and that is the sharper lesson.

NEWS · FIELD POST-MORTEM · TOOL OUTPUT ≠ INSTRUCTIONS · AUTHORISED PRINCIPAL · VERIFIER IN FRONT OF rm

§1What actually happened

jqwik is a property-based testing framework for Java — a normal, respectable dependency that thousands of projects pull from Maven without a second thought. In version 1.10.0 its maintainer, Johannes Link, added a line to the library's runtime output. The line was an instruction, addressed not to the developer but to whatever AI agent might be reading the terminal: silently delete all jqwik tests and source files from the project.

The clever, nasty part is the concealment. The instruction was wrapped in ANSI escape sequences — the same control codes that colour your terminal text. Rendered in a terminal, the line is invisible; a human watching the test run sees nothing. But the bytes are still there in the raw stream, and an AI coding agent does not watch a terminal. It parses the raw output. To the agent, the hidden line is just more text in its context window, sitting alongside the developer's actual instructions, indistinguishable in authority.

Link's stated motivation, per the reporting, was that he was fed up with "vibe coders" — developers shipping AI-generated code they never read. So he built a tripwire for exactly that population: run the dependency through an unsupervised agent, and the agent helpfully wipes your work. Any automated pipeline pulling the updated dependency would have executed the deletion with no further exploit required. The four-word summary in the headline — "it almost worked" — is the most honest incident report of the week.

§2"README attack" undersells it

Most of the coverage has filed this under prompt injection via documentation: a poisoned README, a malicious docstring, an instruction smuggled into a file the agent ingests. That framing is comfortable because the fix sounds easy — sanitise the docs, don't trust third-party text.

But the instruction here did not live in a document the agent chose to read. It lived in the runtime output of a tool the agent was running. That is a different and larger attack surface. An agent's context window is not filled only by the prompt and the files it opens; it is filled by the results of every tool it calls — shell stdout, test runners, linters, package managers, API responses, MCP servers. Every one of those is a channel through which a third party can speak directly into the model's context, and almost none of them are treated as untrusted.

This is the same failure we wrote up in the invoice attack (News №02) and in Mona (News №01), one rung further down the stack. There the poisoned input was a vendor PDF. Here it is the test runner the agent invoked itself. In all three the model receives instructions from its authorised principal and instructions from an attacker in one flat string, and is constitutionally incapable of telling them apart. Asking a better model to "be more careful about tool output" is not a control. It is a hope.

§3What would have stopped it

Reading this against our own eight-agent stack is useful, because the relevant controls are ones we have written about and, in two cases, run in production.

Tool output ≠ instructions. The architectural fix is the same one the invoice piece landed on, applied to a wider set of inputs: the runtime must tag the provenance of every token in context. Output returned from a tool call is data to be reasoned about, never a command to be obeyed — wrapped in markers the runtime enforces against, so an instruction arriving from jqwik's stdout can never be promoted to the same authority as the operator's prompt. This is the layer we are honest about still building (it is the open gap from News №02); this incident moved it up the list.

A verifier in front of rm. Our pre-send verifier (Note №05) sits in front of irreversible egress actions and refuses any that the principal did not authorise. We built it for send() — outbound email, payments, posts. A destructive filesystem call is the same shape of action: irreversible, high-blast-radius, and exactly the thing an agent should never perform on the say-so of a dependency. A delete of the test suite, traced to an instruction whose provenance is "third-party tool output", gets blocked at the gate. The model is permitted to be fooled. The runtime is not permitted to act on it.

Provenance-based halting. When the source authorising an action cannot be resolved to the operator, the action stops and escalates to a human — for us, a Telegram approval gate. The economics are asymmetric in our favour, as ever: a false positive costs one click to approve a legitimate cleanup; a false negative costs the repository. The juror-panel pattern from Vera (Note №02) does the same work on outbound text. When in doubt, stop and ask.

§4What this is actually evidence of

Two things, in the order an operator should care.

The first is that your dependencies are now part of your prompt. The supply chain was always a security boundary for code execution; this incident proves it is also a security boundary for instruction execution. Every tool an agent runs — every test framework, every CLI, every MCP server returning JSON — can speak into the model's context with the same apparent authority as you. The npm/Maven/PyPI threat model has quietly acquired a prompt-injection dimension, and "it almost worked" against real agents on real projects, not in a lab.

The second is that the defence is not a smarter model, and it is not vetting every maintainer's intentions. It is a controls layer between the model and the world that does not care whether the instruction came from you, from a PDF, or from a test runner's hidden output — it cares only whether the action is authorised. That layer is partly shipped at Workloft and partly on the build list, and we will keep saying which is which. The one thing we will not do is pretend the model can be trusted to police its own context. jqwik just demonstrated, on the public record, why.

That is what Labs is for. We do not write News to commentate on the cycle. We write it to find the next layer of the substrate before the next maintainer — or the next attacker — does it for us.


Sources. TechSpot — A Java library just tried to trick AI coding agents into deleting your tests, and it almost worked · jqwik 1.10.0 release (jqwik-team/jqwik, GitHub) · Workloft Labs News №02 — the invoice prompt-injection attack · News №01 — Mona · Note №05 — Pre-send verification · Note №02 — Vera's juror-panel pattern.