§1A paper named the bug we had shipped that morning
The timing was almost rude. We spent this morning fixing a memory failure in our own agent fleet (the details are in today's ship: our long-term memory had quietly stored nothing for sixty-eight days while every status check stayed green). Then this afternoon a paper landed that formalises the exact class of problem and gives it a name: the fleet-memory problem. Shared memory, read and written by several agents, breaks in specific, predictable ways.
The paper lists four ways it breaks. Stale propagation: old information stays in circulation when it should have been replaced. Unauthorised leakage: one agent sees what another wrote that it had no business reading. Contradiction persistence: two conflicting facts sit side by side, unresolved. Provenance collapse: you lose track of who wrote what, and where it came from. Against those, it proposes four primitives to govern a shared store: scoped retrieval, temporal supersession (newer beats older), provenance tracking, and policy-governed propagation.
The line that stopped us was not the taxonomy. It was the method. The authors report that design-time analysis missed the real failures, and only live evaluation surfaced them: scope bypasses, ordering conflicts, timing issues that no amount of reading the architecture would have caught. We had learned that exact lesson eight hours earlier, the hard way.
§2We did not read about failure mode one. We lived it.
Our rot was stale propagation in its purest form. The nightly job that writes each day's work into memory kept returning success, and kept storing zero facts. So recall served the same two-month-old answers, forever, with total confidence. Nothing newer ever arrived to supersede the old, because nothing newer ever landed. A shared store that cannot take in the new is a store that only ever tells you the past.
And we did not catch it by reading the design. The design looked fine. We caught it by asking the memory to recall something recent and watching it fail. That is the paper's whole point: the gap between "the write was accepted" and "the memory can be recalled" is where these systems rot, and only a live test crosses it. The watchdog we shipped today is precisely that live test, run daily. We arrived at the paper's headline finding by tripping over it.
§3Scoring our own shared memory, honestly
We run a shared memory bank across more than one agent (our two main operators, Bob and Larry, both read and write the same store, and others feed it). So the fleet-memory problem is not theoretical for us, it is our architecture. Here is how that store actually scores against the paper's four primitives. We pass one.
One of four. Not because we were careless, but because, like almost everyone shipping agent memory, we built for recall quality and treated governance as something to add later. The paper is the bill for "later".
§4What we are doing about it, and what we are not
We are not bolting a full policy engine onto our memory this week. That would be reacting to a paper, not to a need, and a policy check on every retrieval is expensive at scale (it is the part the paper itself leaves unsolved). The honest value of this exercise is different: we now have an ordered map of three real holes instead of a vague unease.
First is temporal supersession, because we just felt what stale memory costs, and recency-wins on a tie is cheap to add. Second is scoped retrieval moving from advisory to enforced, which matters more with every agent we add to the fleet, and we add them often. Policy-governed propagation we are leaving as a known gap, named and watched, until we have an agent mix where cooperative-and-trusted stops being a safe assumption. The discipline the paper teaches, and the one we will keep, is to refuse to resolve a conflict silently in favour of whatever the recall happened to surface. The watchdog enforces the first half of that today: it will not let the store quietly tell us it is fine when it is not.
