§1The angle: speed bought with opacity
The pitch in this paper is straightforward and, on its own terms, convincing. Instead of having heterogeneous agents pass natural-language messages to one another, you align their key-value caches and transfer that internal state directly. The claim is better task performance than text-based handoff, at lower computational cost. The intermediate text, the bit where one agent writes out what it thinks and another reads it, gets skipped.
That is a real engineering result and the cost argument is honest. Tokenising, transmitting and re-encoding natural language between agents is wasteful. KV-cache alignment cuts the round trip. If you are building a research assistant or a coding swarm, this is a win you should take.
But the thing the paper treats as overhead, the human-readable message, is the exact artefact that regulated buyers are required to keep. The substrate-level point is not whether KV-cache transfer works. It does. The point is that it deletes the inter-agent transcript, and for FCA-regulated firms, local authorities and healthcare providers, that transcript is not a performance cost. It is the evidence.
§2What text-based handoff was quietly doing for you
When Agent A passes a natural-language message to Agent B, three things happen for free. There is a record of what was communicated. There is a record a human can read without specialist tooling. And there is a decision boundary you can point an auditor at: this is what the planner told the executor, here is why the executor acted.
None of that survives KV-cache transfer. The communicated state is a tensor of attention keys and values, aligned across two different model architectures. It is not human-readable in any practical sense. You cannot hand it to a complaints team, a DPO, or a section 151 officer and expect them to reconstruct what the system decided. You cannot diff two runs and explain the divergence in plain English.
The ICO's guidance on AI and data protection (§11, explainability) does not ask whether your agents communicated efficiently. It asks whether you can explain, to the person affected, how a decision about them was reached. The FCA's expectations under SS1/23 on model risk management assume you can trace the chain of reasoning through your governed components. Text handoff gave you that chain almost as a side effect. KV-cache transfer takes it back.
This is the recurring pattern in agent-infrastructure research: the optimisation that makes the system faster is frequently the same optimisation that makes it unaccountable. The transcript was carrying two jobs. Once you remove it for the cost saving, you discover it was also doing the compliance job, and nobody costed that.
§3The heterogeneous part makes it worse, not better
The paper's headline word is heterogeneous: different model architectures, aligned so their caches can talk. For a buyer this should set off an alarm that has nothing to do with the cost curve.
With homogeneous agents, at least the internal representations share a vocabulary. With heterogeneous agents you have introduced a learned alignment layer that maps one model's internal state onto another's. That alignment is itself a model. It can drift. It can be wrong in ways that produce confidently incorrect downstream behaviour with no text trace to catch it. When a text message between agents is garbled, a reviewer or a guardrail can sometimes notice. When an aligned KV-cache transfer is subtly miscalibrated, the failure is silent and lives inside the attention weights.
For a regulated deployment you now have three things to assure rather than one: model A, model B, and the alignment between their caches. The third has no natural-language surface. Your monitoring, your red-teaming, your incident reconstruction all assumed a readable boundary between components. That boundary is gone, replaced by a tensor mapping you cannot subpoena into English.
§4What the substrate would have to provide
The point of Workloft Labs is to ask what the runtime layer must do for this to be deployable by someone who answers to a regulator. KV-cache communication is not disqualified. It needs a shadow.
The minimum viable substrate here is a parallel text channel that is not load-bearing for performance but is mandatory for the record. The agents talk via KV-cache for speed; the runtime forces each transfer to also emit a natural-language summary of what state was passed, logged immutably, timestamped, tied to the run. The KV channel does the work. The text channel does the accountability. You pay a cost to regenerate the transcript you just optimised away, but you pay it on your terms, off the critical path, only where regulation bites.
That is a producer-versus-guardian separation. The agents producing the answer are not the components responsible for the audit record. If you let the same optimisation pressure govern both, the audit record loses every time, because it is pure cost to the task and pure value only to the auditor who is not in the room when the latency target is set.
Anyone building this for a council or an FCA firm should treat the readable transcript as a first-class output of the system, generated deliberately, not as exhaust from an architecture choice. The paper shows you can remove it. The regulation shows you cannot.
§5What the paper does not solve
The paper is, on its own scope, doing nothing wrong. It is a performance-and-cost result and it reports a performance-and-cost result. It does not claim to address auditability, and it would be unfair to mark it down for a problem it did not set out to solve.
What it does not do is tell you the price of reconstruction. If you adopt KV-cache communication and then need to produce a human-readable account of a decision, how faithful is a post-hoc text summary to what the caches actually transferred? Nobody has measured that gap. It could be small. It could be large enough that your reconstructed transcript is fiction that happens to be plausible, which is worse than no transcript at all under any honest reading of explainability duties.
The other open question is alignment-layer assurance. The paper demonstrates the alignment works on its benchmarks. It does not characterise how it fails, how you detect drift in production, or what a wrong cache transfer looks like from the outside. For a regulated buyer those are the only questions that matter, and they are exactly the ones the cost-and-performance frame is not built to answer.
