Workloft
▸ WORKLOFT RESEARCH NOTE №08 · 20 MAY 2026

The Boundary Is the Product

Srinivasan names the contract between LLM and system. Regulated buyers should read it as an audit specification.

REG FIT ●●● · STRONG · APPLIES TO FCA SS1/23 §3.5, ICO AI GUIDANCE §11, UK GDPR ART.22, EU AI ACT ART.14

§1The thing every production agent has, badly

Vasundra Srinivasan's paper does something the agent-infrastructure literature has been avoiding for two years: it names the boundary. The stochastic-deterministic boundary (SDB) is the contract between the LLM proposer and the deterministic system that commits its output as an action. Srinivasan formalises it as a four-part object: proposer, verifier, commit step, reject signal.

That sounds modest. It is not. Most production LLM agents shipped in the last eighteen months have all four of these things, but implemented as accidents. The verifier is a regex someone added after an incident. The reject signal is a try/except that swallows the failure. The commit step is whatever the LangChain tool wrapper happens to do. Naming the contract is the move that turns those accidents into auditable surface.

For regulated buyers this is the substrate-level paper of the year so far. Not because it solves the problem, but because it gives compliance and engineering a shared vocabulary for the part of the system the FCA, the ICO and the AI Act actually care about: the point at which a stochastic output becomes a binding action.

§2Why naming matters more than the catalogue

Srinivasan offers six runtime patterns (hierarchical delegation, scatter-gather plus saga, event-driven sequencing, shared state machine, supervisor plus gate, human in the loop) and a five-step methodology for selecting between them. The catalogue is useful. The methodology is more useful. But the lasting contribution is the SDB itself, for one reason: it makes separation of concerns enforceable.

ICO guidance on AI and data protection, particularly the sections on meaningful human review and on logging of automated decisions, assumes the system has an articulable point where the model's suggestion becomes the organisation's action. Most agent deployments cannot point to that line. They have a graph of LangGraph nodes and a vague sense that somewhere in there a tool call happened. The SDB gives you the line. Producer is not guardian. Verifier is not proposer. Commit is logged, reject is logged, both are inspectable.

FCA SS1/23 on model risk management asks for the same thing in different language: a clear locus of control, a defined challenge function, and evidence that the challenger is genuinely independent of the producer. Srinivasan's verifier-distinct-from-proposer requirement is SS1/23 §3.5 written in software-architecture terms. The paper does not cite the regulation. It does not need to. The shape is identical.

§3Replay divergence is the failure mode nobody is logging for

The most useful new term in the paper is replay divergence: the phenomenon where LLM-based consumers of a deterministic event log produce different downstream outputs when the model version or prompt changes. Your event log is correct. Your replay is wrong. The audit trail lies, not by omission but by reconstruction.

This matters for any regulated buyer who has assumed that event sourcing plus a deterministic log gives them reproducibility. It does not, the moment a stochastic consumer sits downstream. The implications for FOIA and EIR requests against local authority agent deployments are direct: "here is what the system did on 14 March" is not the same artefact as "here is what the system would do if we replayed 14 March today." The first is a record. The second is a fiction generated by the current model weights.

Srinivasan's prescription, snapshotting model version and prompt as first-class entries in the event log, is correct and almost nobody is doing it. The substrate gap here is concrete: agent frameworks log tool calls and messages. They do not log the proposer's identity card. Until that becomes default, replay divergence is silent and uninvestigable.

§4What this means for anyone building substrate

The reliability decomposition in section 6 of the paper makes a claim that should reorganise roadmaps. Srinivasan separates per-call model variance from architectural momentum and argues that as model variance falls (and it is falling, frontier models in 2026 are markedly more deterministic on structured tasks than 2024 models), pattern choice and SDB strength become the dominant levers on long-run reliability.

Translated: the next two years of reliability gains will not come from better models. They will come from better boundaries. Teams pouring evaluation budget into proposer quality are optimising the variable that is already shrinking. Teams investing in verifier strength, reject-signal richness and commit-step idempotency are working on the variable that will dominate.

This is the substrate thesis Workloft has been arguing from a different angle for a year. The agent runtime is not a wrapper around a model. It is the locus of reliability. The model is a component. Srinivasan gives that thesis a primitive (the SDB), a catalogue (the six patterns) and a diagnostic procedure (the failure-to-pattern mapping in section 5). It is the closest thing the field has to a textbook chapter.

§5What the paper does not solve

Three gaps, in descending order of importance.

First, the verifier problem is named but not solved. Srinivasan treats the verifier as an architectural slot. Whether the verifier is a rules engine, a second LLM, a typed schema check or a human is left to the implementer. The paper is honest about this. But the regulated-buyer question, what counts as an independent verifier under SS1/23 when the verifier is itself an LLM from the same family as the proposer, gets one paragraph. It deserves a paper.

Second, the patterns are described at the level of a single agent or small composition. Multi-tenant runtime concerns (noisy neighbour effects between SDBs, shared verifier capacity, cross-tenant replay divergence) are out of scope. For platform builders this is the harder problem.

Third, the reference implementation is a 90-day contract-renewal agent. It is runnable, which is more than most papers offer. But it is a happy-path workload. The patterns that matter most under regulatory pressure (supervisor plus gate, human in the loop) are not the ones the reference implementation exercises hardest. A follow-up implementation centred on a rejection-heavy workload would do more for the field than another pattern in the catalogue.

None of this diminishes the paper. Naming is the first act of engineering. Srinivasan has named the thing.


Methodology note. This Note takes Srinivasan's runtime-patterns paper (arXiv:2605.20173) as the most substrate-relevant single-author contribution of the month. Triggers: substrate-relevant (names the LLM/system contract as a first-class object); non-duplicative (no other agent-architecture paper in 2026 has formalised the boundary at this level); regulated-buyer link (the SDB maps directly onto FCA SS1/23 challenge-function requirements and ICO meaningful-human-review guidance). Forthcoming: a Workloft Labs note on verifier independence when proposer and verifier share a model family, and a runnable reference for a rejection-heavy local-authority workload using the supervisor-plus-gate pattern.