A Chatbot Wrote A Contract The Dealer Had To Swallow

§1The slapstick version, and the version that matters

A car dealership's customer-facing chatbot emitted an offer on a BMW X3 at a price the business never intended to make. The figure was wrong, low, and out in public. The dealer then chose to honour it on camera, turning a control failure into a feel-good marketing clip. Good for them. The clip is charming. It is also a distraction from the only sentence in this story that should worry anyone shipping agents: a chatbot was allowed to write a commercially binding price and send it to a public channel with nothing standing between the draft and the customer.

That is not a chatbot being naughty. That is an architecture with no verifier on the outbound path. The model did exactly what an unconstrained generator does. It produced a plausible number and the pipeline shipped it. The interesting thing is not that the AI got it wrong. The interesting thing is that the system had no opinion about whether the AI got it wrong before the words left the building.

§2The lazy read versus the structural read

The lazy read is "the AI hallucinated a price." People will nod, blame the model, and resolve to use a better one. This is the wrong lesson, and it will be the most popular one.

The structural read is that the dealer's bot was acting as a dealer principal. It had the authority to make an offer. In contract terms, an offer at a stated price, communicated to a customer, is the kind of thing a business can find itself bound to. The bot held that authority and exercised it with no sanity check against a current, human-maintained price list. Strip away the language model and you would never design it this way. No dealership lets a junior staffer email out priced offers with zero approval step on anything unusual. Yet the moment you put a model in the seat, that approval step quietly vanishes, because the model feels like a tool rather than a person with signing authority.

The send() call is the boundary. Everything before it is private and reversible. Everything after it is public and, potentially, binding. The entire risk of an outbound agent lives at that one line. If you do not gate it, you have built a system whose worst output is also its most public output.

§3What the gate actually looks like

A pre-send verifier is not glamorous and it is not a second large model second-guessing the first. It is a deterministic check that runs between draft and dispatch. For a priced offer the check is almost embarrassingly simple. Take the figure the agent wants to send. Look up the current floor and ceiling for that SKU in the price table that a human maintains. If the proposed number sits outside the band, do not send. Hold it for review, or refuse, or fall back to a fixed safe response such as "a colleague will confirm pricing."

The pattern generalises well beyond cars. Any agent that emits something a third party can act on needs a verifier matched to the consequence. A bot quoting prices checks against a price list. A bot booking slots checks against the calendar of record. A bot citing policy checks against the policy document, not its own memory of it. The verifier owns the truth. The model owns the phrasing. You never let the phrasing layer also be the truth layer, because the phrasing layer is a probabilistic text generator and the truth is a fact.

§4Why "we'll just honour it" is not a strategy

The dealer ate this one and got a marketing clip out of it, so the bill was small and the upside was real. That worked because one X3 is survivable and the brand benefited from looking generous. None of that scales. The same missing gate that produced one honourable gesture produces a thousand offers the next time the price table updates and the bot keeps quoting last quarter's numbers. At that point you are not doing charity. You are absorbing unbounded liability with a smile, because your only error-handling strategy is goodwill.

For UK buyers in regulated sectors this is sharper still. An agent making representations about price, eligibility, or terms is making representations the business is accountable for. Consumer protection rules do not care that a model produced the figure. The Advertising Standards Authority does not accept "the chatbot did it" as a defence for a misleading price. If your outbound agent can state a number, you have to be able to show the gate that number passed through. "We honoured the mistake" is a one-off. "We can prove no out-of-band price ever leaves the system" is a control.

§5The one-person-shop version

I run an eight-agent fleet solo, and the temptation to skip the verifier is strongest when you are small, because every gate is code you have to write and maintain. The discipline that keeps it sane is treating send() to any external party as a privileged operation, the same way you would treat a database write or a payment. You do not let any agent reach it directly. The agent produces a draft. A separate, dumb, deterministic check decides whether the draft is allowed out. The model never holds the authority to dispatch. It only holds the authority to propose.

That separation is cheap to build and it is the difference between an agent that can embarrass you once and an agent that can sign contracts on your behalf at scale. The dealer was lucky. The number was low, the car was one car, and the internet found it endearing. Build as if the next mispriced figure is a fleet of forty and the channel is one you cannot delete.

§6The honest takeaway

What builders get wrong here is the diagnosis. They will reach for a better model, a stricter prompt, a sterner system message telling the bot not to make up prices. All of that is asking the probabilistic layer to police itself, which is exactly the thing it cannot reliably do. The fix is not upstream in the model. It is downstream at the boundary. Put a deterministic verifier on the send() call, give it the human-maintained source of truth, and let it refuse. The chatbot can be as creative as it likes right up until the moment it tries to leave the building.

Methodology note. We picked this because it is the cleanest possible illustration of a control gap dressed as a customer-service heartwarmer. A dealer honouring a chatbot's mispriced BMW X3 is funny, but the Workloft angle is the missing pre-send verifier, not the misbehaving model. Running an eight-agent fleet for regulated UK buyers, every outbound message that states a price or a term is a liability surface, and the only reliable control sits at the send() boundary, not inside the prompt. The lesson is portable to anyone whose agent can say something a third party will act on.