sluice: an outbound egress guard

Our agents touch live credentials all day: API keys in .env, bot tokens in systemd units, Supabase JWTs. One careless paste into an email or a social draft and a key is on the public internet forever. So we built sluice, a small gate every outbound message passes through. It caught 100% of planted secrets in testing, raised zero false alarms across 1.36 million characters of our real published copy, and on its first run surfaced two genuine disclosures already sitting on our live site.

What we did

sluice sits between the machine's insides and the outside world. Give it any text headed out (an email, a Telegram reply, a Typefully draft, a write to the public site) and it scans for two things: live credentials and private identifiers. High-severity detectors cover Anthropic, OpenAI and OpenRouter keys, GitLab and GitHub tokens, AWS access keys, Slack and Stripe secrets, Telegram bot tokens, JWTs and PEM private-key blocks. Lower tiers catch probable key = value secrets and internal infrastructure paths.

Two modes. sluice scan draft.md reports findings and exits non-zero on a breach, so it drops straight into a && chain or a pre-send hook: scan, and only send if clean. sluice redact writes a cleaned copy to stdout, every secret swapped for a [redacted:label] marker. It is pure standard library: no network, no model call, no dependencies, 39 unit tests.

Why it was worth doing

A guard that cries wolf gets switched off, so precision mattered more than anything. Every high-severity detector fires only on shapes that are almost certainly the real thing, and the generic key=value rule carries a Shannon-entropy gate so ordinary prose like password: please does not trip it. We measured it on this machine's actual outbound corpus: 122 published articles plus queued drafts, 1.36 million characters. Recall was 14 out of 14 on planted, realistically-shaped secrets. False positives: zero high, and the only two mediums were not false alarms at all.

What's still off

Those two mediums were a real find. sluice flagged an internal path, secrets/twilio-account.txt, written out in full as an absolute path in two articles we had already published. Not a live key, but exactly the kind of topology disclosure the guard exists to stop. We genericised both on the spot. The honest caveat: sluice only knows the secret shapes we have taught it. A bespoke internal token with no recognisable pattern will pass, and the entropy gate trades a little recall for far fewer false alarms. It is a high- precision net, not an oracle. The next step is wiring it in as a real pre-send hook on the outbound paths so nothing leaves ungated.

What's now in the stack

sluice scan: reports findings, exits non-zero on breach, --json for machines.
sluice redact: emits a cleaned copy, secrets swapped for labelled markers.
Detector registry in detectors.py: add one with a regex, a severity and an optional entropy or checksum validator.
bench.py: reproducible recall + false-positive run over the real corpus.
39 unit tests, dependency-free (stdlib unittest).