Pre-send verification: when an agent speaks for the firm, "the model was careful" is not a control
Outbound agents are now firm-of-record speakers — sending email, posting on social, replying in shared inboxes. The producer model that drafts the message cannot also be the control that approves it. A small, opinionated guardian agent in front of send() is the substrate pattern that survives an audit.
§1What's actually changed
Until 2025, "agent" mostly meant a model that helped a human draft something. The human owned the send button. In 2026 that's no longer the deployed pattern at Workloft, at our customers, or at most of the case studies cited in the McKinsey, Klarna, Equinix and JPMorgan headlines. The agent now owns the send button. It picks up a queue of cadence steps, drafts the message from a template plus a context bundle, and posts to SMTP, the LinkedIn API, the helpdesk reply endpoint, or Telegram, on its own beat.
This is the load-bearing change. Once the agent owns the send, the firm has acquired a new compliance surface: everything that goes out under the firm's name was authored by a model whose output the firm cannot deterministically predict. The vendor's framing — "use a more careful model" — does not resolve this. Carefulness is a property of an aggregate over many runs. Any one run can still be the regret.
The procurement question that follows is the one a regulator will ask first: what is the control between the model's output and the recipient's inbox? If the answer is "we use a good model and we wrote a good prompt," that is not a control. It is a hope. Hope does not survive a Section 166 review.
§2Why the producer cannot be the control
The standard objection to a separate verifier is a cost objection: the producer model already had the context, why pay a second model to look at the same draft? Three reasons it has to be a different model — or at least a differently-instructed one — sitting at a different point in the pipeline:
- Separation of concerns is the whole point. A producer optimised to write persuasive copy is the wrong sensor for "is this appropriate?" The producer's incentive — implicit in its training and its prompt — is to ship the message. The guardian's incentive is to refuse. Asking the same model to play both roles, in the same forward pass, is asking the prosecution and the defence to share a brain. Mature pipelines split them.
- The failure mode space is bigger than the producer's prompt covers. A cadence template can specify tone, structure, ask. It cannot anticipate "today is Saturday and we have a no-weekend-sends rule," or "this draft mentions an internal codename that should never appear in cold outbound," or "this draft references a vendor announcement that doesn't actually exist." Those are not template problems. They are policy and reality-check problems. Different sensors.
- The auditor will look for an independent attestation. The deliverable for a regulated buyer is not "we wrote a careful prompt." It is a per-message log: this draft was produced by model X at time T, then evaluated by guardian Y across axes A1–An, with verdict V, with reason R. That log is the artefact. The guardian's existence is what makes the log auditable.
The Anthropic-published ARIS pattern (Note №01) describes the same architecture for the inverse direction — a reviewer agent gating an executor's actions. Pre-send verification is ARIS pointed at the wire instead of the disk. The substrate logic is identical.
§3The four-axis verifier we ship in front of send()
The pattern Workloft now ships in front of every Maggie outbound — and is hardening for civiclaw's customer-facing agents — has four axes. Two are deterministic, two are semantic. Defence in depth, not defence in series.
Axis 1 — hard rules (deterministic). Calendar checks (no marketing sends Saturday or Sunday), rate limits, business-hours respect for the recipient's timezone, sender reputation gates. These are pure code. They never call a model. They are the cheapest checks and they catch the dumbest mistakes — which is, empirically, the category most likely to put a firm in front of an angry recipient.
Axis 2 — confidential-leak redlist (deterministic). A maintained list of strings that must never appear in outbound under any circumstances unless the recipient is the related party: internal codenames, undisclosed deal values, client names from other engagements, internal agent names. Token-level match, case-insensitive, BLOCK if the term appears in subject or body and the recipient address does not also contain it. A 30-line check that has caught real leaks twice in the last fortnight at Workloft's own scale.
Axis 3 — semantic claims and clarity (model-graded). A single Gemini Flash call, structured JSON output, scoring three sub-axes: confidential (any specific private detail that doesn't belong in cold outbound — bias to BLOCK), claims (any factual assertion that looks invented or unverifiable — bias to WARN), clarity (is the ask clear, is there exactly one next step). Cheap. Fast. Aggregated to BLOCK / WARN / PASS by worst-axis-wins.
Axis 4 — house-style gate. Last-line check that spelling, register and idiom are firm-style. For Workloft that means British English, for an asset-management buyer it might mean "no superlatives" or "no forward-looking statements." This axis usually fires WARN, not BLOCK, because copy-style errors are recoverable and shouldn't gate a send the regulator already approved.
The aggregate verdict is the pessimistic max: any BLOCK aborts the send and pings the operator. Any WARN sends but logs the warning. PASS sends silently. Every verdict, regardless of outcome, is recorded with the draft and the timestamp. The log is the deliverable.
The reference implementation is open source as agent-verifier — Apache 2.0, Python 3.10+, no required dependencies, bring your own LLM. pip install coming to PyPI; for now: clone and pip install -e .. The hosted endpoint described below uses the same code.
§3.5Try it
Paste a draft, hit verify, see the verdict. The semantic axis runs against Gemini 2.5 Flash. Calendar, redlist and style are deterministic. Public, no key needed, rate-limited at 20 calls/hour/IP for the playground (install the open-source package locally for production traffic).
§4What this looks like for a regulated buyer
For an FCA-regulated firm operating an outbound agent — cadence, support reply, broker comms — the recommendation we'd give in a procurement conversation is the same shape we apply to ourselves:
- Require an independent guardian step. Not a self-check by the producer. A separate model invocation, with its own prompt, returning a structured verdict that cannot be silently overridden.
- Require deterministic checks alongside the semantic one. The semantic guardian is good at "is this on-brand and clear." It is bad at "did we forget it's Saturday" or "did we just leak a counterparty's name." Those are not LLM problems. They are policy problems. Code them.
- Require a verdict log. Per-message, immutable, with the guardian's reasoning. This is the artefact a Section 166 reviewer will want, and it is the artefact an internal post-incident review will want when a send goes wrong despite the gate.
- Treat WARN as a signal, not as noise. A guardian that never fires WARN is a guardian that isn't actually checking. Track WARN rates over time as a leading indicator of cadence drift.
The procurement question is not "is your agent safe." That question has no defensible answer. The procurement question is: what runs between the model's output and the recipient, and can you show me its log? That question has a defensible answer or it doesn't.
§5The honest caveat
A four-axis pre-send verifier is not a complete control regime. It does not detect a producer that has been adversarially prompted. It does not detect a recipient that should never have been on the cadence list (that is the upstream segmentation control's problem). It does not detect a campaign whose entire premise is wrong. It is a final-mile gate, not a strategic review.
What it is: a small, cheap, fast, defensible piece of substrate that turns "the model was careful" into "the message was independently evaluated and the evaluation is logged." That delta — from a hope to an artefact — is the difference between a procurement security review you pass and one you don't.
We will report quantitatively in a future Note: false-block rate, false-allow rate, cost per verified send, latency added. The early-week numbers are encouraging but the sample is too small to publish. By Note №07 or №08 we should have a fortnight's data and a real cost-of-quality calculation that a buyer's risk function can stress-test.
/maggie/verifier.py and is referenced from the cadence send loop. Forthcoming: a public-API version of the verifier as part of the Labs API.
