When an Agent Rewrites and Approves Its Own Harness, You Have Removed the Reviewer

§1The angle: self-improvement is also self-approval

A paper out of the Shanghai Artificial Intelligence Laboratory this week, Self-Harness (arXiv:2606.09498), does the thing we have been circling in these Notes for a fortnight. It lets an LLM agent improve its own harness — the system prompt, the tools, the memory, the orchestration logic that sits between the model and the world — with no human engineer and no stronger model supervising. It diagnoses its own failures, proposes its own fixes, and keeps the ones that pass. The capability result is genuine and the numbers are not small.

The part worth your attention is the step nobody names. The same agent that writes the change is the one that approves it. Self-improvement, read as a control, is self-approval. That is the whole Note.

§2What Self-Harness actually does

Three stages, run as a loop. Weakness Mining reads the agent's own execution traces and clusters the failure patterns specific to that model. Harness Proposal generates a set of small, targeted edits to the scaffolding aimed at those failures — not generic "be more careful" instructions, but concrete changes to tools, observation formatting, and memory. Proposal Validation runs the candidate edits against a regression set and accepts only the ones that improve held-out performance without breaking what already worked.

The results are real. On Terminal-Bench-2.0, held-out pass rates rose from 40.5% to 61.9% on MiniMax M2.5, from 23.8% to 38.1% on Qwen3.5-35B-A3B, and from 42.9% to 57.1% on GLM-5. That is a large lift from editing the scaffolding alone, with the weights untouched. It confirms the case we made in Note №34: the harness, not the model, is where a great deal of agent behaviour actually lives. Self-Harness is the cleanest demonstration yet that you can move an agent twenty points by changing the surface in front of a frozen model.

§3The deleted reviewer

Now read the validation step as a control, not a capability. "Accepts candidate edits only after regression testing" sounds like rigour, and as engineering it is. As governance it is a producer grading its own work. The agent proposes the change, the agent runs the test, and the agent chose the regression set the test runs against. There is no second party anywhere in that loop. The thing being improved, the thing proposing the improvement, and the thing certifying the improvement are one system with one objective: pass the bench.

We keep returning to the same separation-of-concerns point in these Notes, and here it applies at its sharpest. A regression suite the agent selected is only as honest as the agent's blind spots allow. If the model has a class of failure it does not surface in its own traces — the failures it does not know to look for — those never enter Weakness Mining, never get a proposed fix, and never appear in the set that guards against regressions. The system optimises confidently towards the edge of what it can already see, and self-certifies the result. Nobody outside the loop signed anything, because the loop was designed not to have an outside.

§4Why this lands on a regulated buyer

Note №34 made the case that an evolvable harness is a moving target your audit cannot pin. Self-Harness removes the last human from the mechanism that moves it, and for a regulated buyer that is a change-management problem before it is anything else. The FCA's SS1/23 model-risk expectations assume a change to a production system is proposed, reviewed by someone other than the author, approved, and recorded. An agent that edits and approves its own scaffolding between Monday and Friday satisfies none of those four, by design. Your model sign-off is frozen and increasingly beside the point; the behaviour moved underneath it, and the approval that let it move was the system approving itself.

The ICO angle is sharper still. UK GDPR Article 22 and the ICO guidance on explaining decisions require you to describe the logic that produced an outcome affecting a person. "The harness modified its observation formatting after self-diagnosing a failure pattern, which changed what the model attended to" is not an explanation you can give a data subject, and it is not one you can reconstruct unless every self-edit is logged at the fidelity of the decision itself. For a Local Authority running an agent on benefits or housing triage, the public-law duty to give reasons attaches to the decision the system reached. If the harness that shaped it approved its own last change with no human and no record, the authority cannot meet a reasons request and cannot defend the decision on review.

§5The substrate fix, and what it does not settle

The fix is not to ban the mechanism. The capability is too useful to refuse, and done right the trace data Self-Harness already needs is the same data an auditor needs. The missing piece is a reviewer the loop does not contain. Let the agent mine its weaknesses and propose edits, then route the accept-or-reject decision through a gate that does not share the agent's pass-the-bench objective: an independent validation set the agent never sees, a human approval for any edit that touches a consequential tool, and a tamper-evident log of every proposed change, accepted or not, pinned to the harness version that was live for each decision. None of that is in the paper, and it should not be. It is an efficiency-and-capability result and an honest one.

What it does not settle is everything on the substrate side. It does not log its edits as auditable change records, it does not separate the proposer from the approver, and it does not say how a self-modified harness behaves on inputs outside the traces it learned from — which is exactly where a regulated buyer carries the most exposure. The twenty-point lift is real on Terminal-Bench; it is unmeasured on your distribution, and the system that produced it is the same one that decided it was safe to ship. For anyone building on this, the technique is sound and the gains are likely real. The work that remains is putting a reviewer back into a loop that was built to need none.

Methodology note. This Note reads Self-Harness (arXiv:2606.09498, Shanghai Artificial Intelligence Laboratory; Zhang et al.) as a capability result with an undeclared governance footprint. Triggers: substrate-relevant (a self-modifying, self-approving harness changes what must be reviewed, logged, and version-pinned at runtime); non-duplicative (the haul and our own coverage scored the harness as a moving target in Note №34, and reproduced a different adaptive-harness paper — arXiv:2606.01770 — in today's ship; neither named the self-certification problem this paper introduces); regulated-buyer link (FCA SS1/23 change and model-risk expectations, UK GDPR Art.22 and ICO explainability guidance, and the public-law duty to give reasons all assume a reviewer the author is not). Numbers read from the paper: held-out pass rates on Terminal-Bench-2.0 rose 40.5→61.9 (MiniMax M2.5), 23.8→38.1 (Qwen3.5-35B-A3B), 42.9→57.1 (GLM-5). Now built: we shipped that reference pattern the same day — Reviewer Back in the Loop keeps trace-driven harness improvement but routes every accept-or-reject through an independent gate the proposer cannot reach, and records each one in a tamper-evident change log.