Prompt-Level Distillation and the Audit Gap Nobody Costed

§1The angle: cheaper inference, more expensive accountability

Prompt-level distillation, as the paper frames it, extracts reasoning patterns from a large teacher model and folds them into prompts that drive a smaller student. The selling points are familiar: lower latency, lower cost, a student that punches above its parameter count. The interpretability claim is the interesting one, because it is the claim that regulated buyers will hear loudest and scrutinise least.

Here is the argument. Distillation has always relocated risk rather than removing it. Weight-level distillation buries the teacher's behaviour inside the student's parameters, where nobody can read it. Prompt-level distillation does the opposite: it writes the borrowed reasoning into text. That sounds like a win for transparency. For a UK Local Authority or an FCA-regulated firm, it is also a new artefact that has to be versioned, retained, and defended. The reasoning is now legible, which means it is now discoverable.

§2Where the reasoning actually lives now

The substrate question is not "does the student perform well". It is "when something goes wrong, what do you hand the auditor". With weight-level distillation the honest answer was always uncomfortable: the model does what it does, and we can show you evals but not a chain of cause. Prompt-level distillation changes the answer, and not entirely for the better.

The reasoning pattern now lives in a prompt template. That template was generated by a teacher model whose outputs you may not have logged, may not control, and may not be permitted to redistribute. So the chain of provenance runs: teacher model (possibly a third-party API), to extracted reasoning pattern (a derived artefact), to a production prompt (the thing actually steering decisions about a resident, a customer, a patient). Each hop is a point where the ICO's expectation of explainability under its AI guidance can break.

The ICO's guidance on explaining decisions made with AI, particularly the sections on the rationale and responsibility for an automated decision, assumes you can say why a decision was reached and who is accountable for the logic. A distilled prompt makes the "why" readable but orphans the "who". The reasoning was authored by a teacher you do not own, distilled by a process the paper describes but does not constrain, and deployed by a student that follows it without having derived it. Three parties, one decision, no clear owner of the logic.

§3The interpretability claim is doing too much work

The paper claims the approach maintains interpretability. For a research benchmark that is a defensible claim: you can read the prompt, you can trace the reasoning steps the student is told to follow. But interpretability in the FCA sense, the sense that matters under SS1/23 on model risk management, is not "a human could read this". It is "the firm can demonstrate it understands the model, can challenge its outputs, and can evidence ongoing governance of the thing in production".

A distilled prompt is a frozen snapshot of a teacher's reasoning at the moment of extraction. The teacher model drifts. It gets retrained, deprecated, or silently updated behind an API. The student keeps following a reasoning pattern that the teacher itself may no longer endorse. There is no feedback channel in the prompt-level approach that tells you the distilled logic has gone stale. You have interpretability of a fossil.

This is the gap the paper does not cost. It optimises for the student matching the teacher at extraction time. It says nothing about the maintenance burden of keeping a distilled reasoning pattern defensible across a teacher's lifecycle, which is exactly the burden a regulated buyer inherits on day one.

§4What a substrate builder would actually need

If you are running this in production for a compliance-bound buyer, the paper hands you a capability and leaves you the governance. The missing substrate is roughly this. First, provenance binding: every production prompt needs to carry the teacher model identity, version, and extraction date as non-strippable metadata, so an auditor can reconstruct where a reasoning pattern came from. Second, a staleness signal: some mechanism that flags when the teacher has moved and the distilled prompt should be re-derived or retired. Third, a redistribution check: many teacher model licences restrict using outputs to train or steer competing systems, and a distilled reasoning pattern is arguably exactly that. The paper does not touch licence terms. Your legal team will.

None of this is exotic. It is the same separation-of-concerns discipline we keep arguing for: the thing that produces a reasoning pattern is not the thing that should be trusted to vouch for it in production. A distilled prompt with no provenance binding is a producer pretending to be a guardian. That conflation is precisely what the ICO's accountability expectations exist to prevent.

§5What the paper does not solve

The paper, on the strength of its summary, demonstrates that reasoning patterns can be extracted and that students improve. That is a genuine and useful result at the capability layer. What it does not solve, and does not claim to, is everything downstream of "it works on the benchmark".

It does not address teacher licence compatibility, so a buyer cannot tell from the paper whether deploying a distilled prompt breaches the terms of the API it was extracted from. It does not address drift between teacher and distilled artefact, so there is no answer to "how do we know this reasoning is still correct six months on". It does not address retention and discovery: a readable reasoning prompt is a record, and records about decisions affecting individuals fall under UK GDPR and, for public bodies, potentially FOIA and EIR. And it does not name authors or datasets in the material we were given, which is its own caution: a distillation method you cannot fully attribute is a method you cannot fully defend. We will follow up with a provenance-binding pattern for distilled prompts that a regulated buyer could actually put in front of an auditor.

Methodology note. This Note takes the prompt-level distillation paper (arXiv:2602.21103) as a capability result with an uncosted governance tail. Triggers: substrate-relevant (relocates the audit boundary from weights to readable prompts); non-duplicative (the interpretability claim is being read uncritically by buyers, and nobody is costing teacher-drift or licence exposure); regulated-buyer link (UK Local Authorities and FCA firms under SS1/23 model risk rules and the ICO's explainability expectations inherit the provenance burden on deployment). Forthcoming: a provenance-binding pattern for distilled prompts, with non-strippable teacher metadata and a staleness signal, written for audit defensibility.