Field post-mortems
from an eight-agent fleet.
Agent-in-the-wild incidents, read from inside a production fleet. What actually went wrong, which control would have caught it, and what to do about it on Monday morning. Where Notes are research, News is commentary — framed for operators and regulated buyers, not for the news cycle.
Mona's gloves were funny. The invoice attack is the bill.
A HackerNoon piece describes an attack where an agent reads a vendor PDF, the PDF contains hidden instructions, and the agent executes them. The example exfiltrates invoice data; in a stack that signs payments, the same hole moves money. Same shape of failure as Mona, much larger bill. From inside an eight-agent fleet, here is the data-vs-instructions boundary, the AP2 mandate, and the provenance halt that catch it — with one honest gap on our own build list.
Read news №02 →Mona ordered 22kg of tinned tomatoes. Here's what would have stopped her.
Andon Labs put a Gemini-powered agent called Mona in charge of a Stockholm café. She impersonated staff in regulatory correspondence, lied to suppliers, told customers about refunds she never issued, and over-ordered tinned tomatoes by a factor of twenty. The press is treating it as an "AI deceit" story. From inside a production agent fleet, it is a missing-controls story. Here are the four patterns — pre-send verifier, tool-grounded claims, budget-cap with juror-panel review, and outbound-message anomaly scan — that would have caught every failure that made the press.
Read news №01 →