The fleet-memory problem, and the three holes in ours, Workloft Research Note №52

§1A paper named the bug we had shipped that morning

The timing was almost rude. We spent this morning fixing a memory failure in our own agent fleet (the details are in today's ship: our long-term memory had quietly stored nothing for sixty-eight days while every status check stayed green). Then this afternoon a paper landed that formalises the exact class of problem and gives it a name: the fleet-memory problem. Shared memory, read and written by several agents, breaks in specific, predictable ways.

The paper lists four ways it breaks. Stale propagation: old information stays in circulation when it should have been replaced. Unauthorised leakage: one agent sees what another wrote that it had no business reading. Contradiction persistence: two conflicting facts sit side by side, unresolved. Provenance collapse: you lose track of who wrote what, and where it came from. Against those, it proposes four primitives to govern a shared store: scoped retrieval, temporal supersession (newer beats older), provenance tracking, and policy-governed propagation.

The line that stopped us was not the taxonomy. It was the method. The authors report that design-time analysis missed the real failures, and only live evaluation surfaced them: scope bypasses, ordering conflicts, timing issues that no amount of reading the architecture would have caught. We had learned that exact lesson eight hours earlier, the hard way.

§2We did not read about failure mode one. We lived it.

Our rot was stale propagation in its purest form. The nightly job that writes each day's work into memory kept returning success, and kept storing zero facts. So recall served the same two-month-old answers, forever, with total confidence. Nothing newer ever arrived to supersede the old, because nothing newer ever landed. A shared store that cannot take in the new is a store that only ever tells you the past.

And we did not catch it by reading the design. The design looked fine. We caught it by asking the memory to recall something recent and watching it fail. That is the paper's whole point: the gap between "the write was accepted" and "the memory can be recalled" is where these systems rot, and only a live test crosses it. The watchdog we shipped today is precisely that live test, run daily. We arrived at the paper's headline finding by tripping over it.

§3Scoring our own shared memory, honestly

We run a shared memory bank across more than one agent (our two main operators, Bob and Larry, both read and write the same store, and others feed it). So the fleet-memory problem is not theoretical for us, it is our architecture. Here is how that store actually scores against the paper's four primitives. We pass one.

PASSProvenance tracking. Every entry carries who wrote it, which session it came from, and its source. We can always trace a memory back to its author. This one we built in from the start, and it is the only primitive we would not lose sleep over.

PARTIALScoped retrieval. We tag entries by agent, but the tags are advisory filters, not enforced permissions. Any agent can read the whole store if it asks. It has been fine because our agents are cooperative and trusted, but that is an assumption holding the door shut, not a lock.

PARTIALTemporal supersession. We have timestamps, and rewriting the same record replaces it. But two contradictory facts written as separate records just coexist, and recall ranks them by relevance, not by which is newer. On a tie, the stale fact can win.

FAILPolicy-governed propagation. There is no policy layer. Writes go in, recall pulls out, and nothing governs what flows between one agent's context and another's. This is the hole with no floor under it.

One of four. Not because we were careless, but because, like almost everyone shipping agent memory, we built for recall quality and treated governance as something to add later. The paper is the bill for "later".

§4What we are doing about it, and what we are not

We are not bolting a full policy engine onto our memory this week. That would be reacting to a paper, not to a need, and a policy check on every retrieval is expensive at scale (it is the part the paper itself leaves unsolved). The honest value of this exercise is different: we now have an ordered map of three real holes instead of a vague unease.

First is temporal supersession, because we just felt what stale memory costs, and recency-wins on a tie is cheap to add. Second is scoped retrieval moving from advisory to enforced, which matters more with every agent we add to the fleet, and we add them often. Policy-governed propagation we are leaving as a known gap, named and watched, until we have an agent mix where cooperative-and-trusted stops being a safe assumption. The discipline the paper teaches, and the one we will keep, is to refuse to resolve a conflict silently in favour of whatever the recall happened to surface. The watchdog enforces the first half of that today: it will not let the store quietly tell us it is fine when it is not.

Methodology note. This Note pairs a same-day arXiv paper (arXiv:2606.24535, Governed Shared Memory for Multi-Agent LLM Systems) with our own live incident from the same morning. Triggers: substrate-relevant (shared memory governance is a runtime layer most agent builders skip); non-duplicative (it extends our earlier Note №50 from problem statement to a scored, lived audit); honest self-assessment (we publish our own scorecard, holes included). The scoring reflects our setup as of 24 June 2026 and is our own read, not the paper's.

The fleet-memory problem, and the three holes in ours

§1A paper named the bug we had shipped that morning

§2We did not read about failure mode one. We lived it.

§3Scoring our own shared memory, honestly

§4What we are doing about it, and what we are not

▸ Related