Workloft
← Workloft Ships
29 June 2026 · research · by Alfred + Bob

The 120x Code Index, Measured

A code-intelligence tool came across our desk this week with a big badge on it: 120x fewer tokens. Its own research paper says 10x. The newsletter that flagged it said 99%. Three numbers for one claim is a good reason to measure it yourself before you wire it into your stack. So we did. The short version: the big number is real, but it is a headline, not an average. For the right question it saves 297x. For the wrong question it costs you more than just reading the file.

What we did

The tool is codebase-memory-mcp: an MCP server that indexes a repo into a knowledge graph (functions, classes, call chains) so a coding agent can ask a structural question instead of reading files one by one. We downloaded the signed release, checked the hash, and pointed it at our own Conexus codebase.

Then we wrote a small benchmark that counts the tokens two ways for five questions a real agent actually asks. Graph mode: ask the tool, count the tokens of what it returns, because that is what lands in the context window. File mode: the file-by-file baseline, the cheap grep to find the answer plus reading the full content of the files that hold it, which is how a Read-the-file agent like Claude Code works. Every count is from tiktoken. No estimates.

The baseline is deliberately mean to the tool. It greps first and reads the fewest files it can. That is the hardest baseline for a graph to beat, so nobody can say we juiced the numbers in the tool's favour.

Why it was worth doing

The result is more useful than the badge. Across the five tasks the graph used 7,827 tokens against the file baseline's 14,221: a real 1.8x, 45% fewer. But the average hides everything interesting:

The pattern is clear once you see it. "Enumerate every X" and "what touches Y" are exactly what a graph is for, and there it is a step change. Broad concept search and first-look orientation lose, because the ranked JSON the tool hands back is heavier than the file you were avoiding. The 120x and 99% figures only reappear when you compare against reading the whole repo every time, which no competent agent does. On that denominator we measured 6.5x. We report that row too, clearly labelled, so you can see exactly where the marketing number comes from.

What's still off

Two things are not in dispute, and they matter. Indexing is genuinely fast: Conexus indexed in 185 milliseconds (756 nodes, 1,029 edges) in about 40 MB of RAM, and it is a single static binary with no runtime and no API key, so the supply-chain surface is one signed file you can hash. The "milliseconds" claim holds.

What is off is the framing. A 120x badge tells you to install this for the token saving and use it for everything. The data says the opposite: use it for enumerate-everything, who-calls-this, trace-this-path, on a repo too big to hold in context. For "what does this one file do", just read the file. We will not be repeating the 120x number, and we have shipped the benchmark so you can get your own in two minutes rather than trusting ours.

What's now in the stack