We Scanned Our Own Agent Fleet for Supply-Chain Compromise

§1What we ran, and what came back

On 23 May, Perplexity open-sourced Bumblebee, a read-only supply-chain scanner for developer machines. It is a single static Go binary with no third-party dependencies, released under Apache 2.0. It walks the on-disk metadata for npm, PyPI, Go modules, RubyGems, editor extensions, browser extensions and MCP server configs, turns all of that into structured component records, and — when you hand it an exposure catalogue — flags exact matches against known-compromised package versions. We ran it across our own fleet, the VPS that hosts the seven-plus agents behind Workloft, and we wanted to see what it found.

It found nothing. The deep profile considered 359,736 files, emitted 18,772 component records — 17,819 npm, 899 PyPI, 54 MCP server entries — and matched zero of them against the six bundled threat catalogues. The whole sweep took 1.5 seconds. The narrower baseline profile inventoried another 972 components from the system and MCP roots, also clean. That is the headline, and it is genuinely the least interesting thing in this Note.

§2A clean scan is a photograph, not a smoke alarm

The catalogues Bumblebee shipped with are dated artefacts. They encode specific campaigns — the Shai-Hulud npm and PyPI worm waves from May, the AntV compromise spanning 324 packages across 643 versions, the seven malicious node-ipc releases, a backdoored Go decimal typosquat, the GemStuffer RubyGems exfiltration run. A match means "you have a package version that was named in a known incident". No match means "none of the things we knew to look for on 22 May are present today". Those are narrow claims. They say nothing about the compromise that gets published tomorrow.

So the value of running this is not the green tick. It is the 18,772-line inventory the scan produced as a by-product. That inventory is a software bill of materials you did not have to write by hand, and its worth compounds: the moment a new campaign is reported, you re-run the matcher against the same on-disk state and get an answer in seconds, with no fresh install and no guesswork about what you actually have deployed. The safeguard is not the scan. It is keeping the inventory and re-diffing it against live intelligence on a schedule.

§3The design choice that matters more than the result

Bumblebee never invokes a package manager and never executes an install script. That sounds like a footnote until you remember how the worms it hunts actually spread. npm packages can carry postinstall hooks that run automatically the instant npm install touches them; that is the propagation mechanism for most recent supply-chain compromises. A scanner that resolved dependencies the normal way — by asking the package manager — would risk triggering the very payload it was sent to detect. Reading metadata off disk, and only off disk, is what makes it safe to run on a machine you already suspect.

This is the same separation we keep coming back to in these Notes. The thing that inspects must not share an execution path with the thing it is inspecting. A guardian that has to run the producer's code to see what it did is not a guardian. Bumblebee gets this right at the level of a CLI, and it is a clean illustration of the principle for anyone building the heavier machinery upstream.

§4Why a one-person shop near regulated data should care

One of the bundled catalogues, GemStuffer — 123 gems across 155 versions — was a campaign reported as specifically targeting UK local government. We build for that buyer. For anyone running agents anywhere near regulated data, "we don't actually know what is installed on the box" stops being a tooling gap and becomes a UK GDPR Article 32 problem: you are expected to demonstrate appropriate security of processing, and you cannot demonstrate what you cannot enumerate. NCSC's supply-chain guidance leans the same way. The artefact a serious buyer wants is not a vendor's assurance that they are clean. It is a component inventory plus the date you last checked it against current threat intelligence. Bumblebee makes both cheap enough that there is no excuse to skip them.

§5What this does not tell us

Plenty, and it is worth being straight about it. The matcher is exact: it pairs ecosystem, normalised name and version, and nothing else. It is a package-presence check, not an EDR or an indicator-of-compromise feed; it will not see a malicious package nobody has catalogued yet, and it will not notice a process behaving badly at runtime. Two of the six catalogues cover Go and RubyGems, neither of which is installed on our box, so those came back clean by absence rather than by inspection — a real result, but a weaker one than the npm and PyPI sweeps. And this run covered the VPS only; the laptop that also touches this work was out of scope today and is its own inventory to keep. The win here is narrow and honest: cheap, repeatable, read-only enumeration you can stand behind. That is not a guarantee of safety. It is the floor you build the rest of your supply-chain story on. So the question for your own stack is simple — if a worm were named tomorrow, how long would it take you to answer whether you are exposed, and would the answer be a scan or a shrug?

Methodology note. Bumblebee v0.1.1 (commit c240898, Go 1.25), installed from the signed release tarball with checksum verified, selftest passing. Two runs on host workloft-operator (Linux): baseline profile over default package and MCP roots (972 components, 0 findings, 452ms); deep profile over /home/workloft, /var/lib/larry-bob and /var/lib/shared with --exposure-catalog set to the bundled threat_intel/ directory (18,772 components, 0 findings, 359,736 files considered, 1.56s). Catalogues checked: mini- and AntV Shai-Hulud, node-ipc, Nx Console VS Code, the shopsprint Go decimal typosquat, and GemStuffer. NDJSON output retained. Scope: VPS only; macOS endpoint not scanned in this run.

We Scanned Our Own Agent Fleet. The Clean Result Is the Boring Part.

§1What we ran, and what came back

§2A clean scan is a photograph, not a smoke alarm

§3The design choice that matters more than the result

§4Why a one-person shop near regulated data should care

§5What this does not tell us

▸ Related