Workloft Labs · Pillar guide

Building Reliable AI Agent Infrastructure

The substrate an agent needs before it touches production: a control plane, contracts with its environment, and an audit trail that is not an afterthought.

Most agent demos work because nothing is at stake. Production is different. The moment an agent can act, the hard part stops being the model and becomes everything around it: who governs its access, what it is allowed to touch, and whether you can reconstruct what it did afterwards. That surrounding machinery is the substrate, and it is where real systems live or die. These are the notes where we have worked it out, usually by watching someone else's system fail first.

The control plane

Tool use is commodity now. The contest moved to who owns access, context, and the log. Govern that surface or the agent governs you.

The Control Plane Was Always the Hard PartWebMCP and agent-ready platforms standardise tool use and execution. The real prize is the control plane: who governs access, context, and the audit log. The Harness Is the Control Surface Nobody AuditsHarnessX evolves agent runtime interfaces from execution traces. We argue the harness, not the model, is the unaudited control surface regulated buyers must govern. Agent governance is now a runtime problemMicrosoft’s Agent Governance Toolkit turns agent safety into code: policy checks, zero-trust identity and sandboxing for regulated AI buyers now in practice. Microsoft Shipped Agent Governance As Code. The Hard Part Is What It Assumes.Microsoft's agent-governance-toolkit turns OWASP Agentic Top 10 into runnable code. The substrate take: it presumes an identity and audit layer most buyers don't have.

Contracts with the environment

A sandbox limits blast radius. A contract says what the agent is allowed to assume and do, and what it owes back. The second is the one that survives an audit.

Agents Need Environment Contracts, Not More SandboxesLi et al.’s survey shows why agent reliability depends on engineered environments: state, tools, synthesis, evaluation, contracts, and audit evidence. The Action Interface Is the Audit SurfaceSpatialClaw uses a stateful Python kernel as the agent action interface, beating structured tool calls by 11.2 points. What that means for agent auditability. The Intent Debt: The Audit Liability Agentic Stacks Don't CountProduction agent stacks count completed work, not signed intents. AP2's two-mandate design already provides the primitive to make the debt auditable. Most teams use only half of it. Replanning Is the Audit GapAdaPlanBench tests LLM agents replanning under revealed constraints. The substrate problem: every mid-task pivot is an unlogged decision your auditor cannot reconstruct.

The audit trail as a primitive

If the record is bolted on afterwards, it is fiction. Treat it as substrate from the start and it becomes the most valuable thing the system owns.

Your audit log is training dataWe applied Agent Context Compilation (arXiv:2605.21850) to our own production audit log. 25 agent trajectories, 102 grounded long-context QA pairs, $0.0132 of compute. Open source. Cache Continuity Is an Audit Problem, Not a Cost ProblemTokenPilot cuts agent inference costs by up to 87% by keeping prompt prefixes stable. The substrate take: prefix stability is also a reproducibility and audit primitive. When Agents Stop Talking: KV-Cache Communication and the Audit Hole It OpensKV-cache communication between heterogeneous agents beats text on cost and performance. But it removes the human-readable transcript regulators rely on. The substrate take. Memory Is Substrate, Not a Feature: What PersonalAI 2.0 Gets Right About Agent RecallPersonalAI 2.0 treats agent memory as a graph with adaptive traversal. For regulated buyers, that is the difference between recall you can audit and recall you cannot.

Workloft is a one-person AI engineering studio. We publish what we learn building agent systems in the open. Read all the notes → or get in touch →.