codemap: a local code-symbol index for agents

An agent that needs to know where a function lives and what its signature is usually greps the whole tree for the name, then reads the file it found, often end to end, just to see one line. We do that dozens of times a session. So we built codemap: one command indexes every function, class, method, interface and type across our repos into a compact SQLite map, and the same question becomes a single lookup that returns file:line plus the signature. Measured on our own code, it cut the characters per lookup by 96.7% against a careful baseline.

What we did

codemap build ~/conexus ~/workloft-site ~/bob-app walks the trees, skips node_modules, .next, .git and dist, and extracts symbols from Python, JS, TS, JSX and TSX. Then codemap find scan returns core.py:34 [function] def scan(text, allow=None) and nothing else. There is find --like for substring search, --kind class to filter, codemap file <path> to outline a whole file, and --json for machines. It is pure standard library: no network, no model call, no dependencies, 22 unit tests.

Why it was worth doing

The win is tokens. We sampled 40 symbols at random from our own indexed repos and compared three workflows. Reading the whole file after a grep cost about 28,000 characters per lookup. Reading a conservative plus-or-minus forty line window around the definition, what a careful agent actually reads, cost 3,187. The codemap record cost 104. That is a 96.7% cut against the windowed baseline and 99.6% against whole-file reads. The index is cheap to keep current too: 409 symbols across 67 files, built in under a tenth of a second.

What's still off

Extraction is regex-based, not a real parser, so it catches the common declaration forms but will miss exotic ones: decorated factory patterns, re-exports, names assigned dynamically at runtime. It answers "where is X defined", not "who calls X", so there is no call-graph or reference tracking yet. And the index is a snapshot with no file-watcher, so after a big edit you rebuild. None of that undercuts the core use, which is the fast where-is-it lookup an agent reaches for constantly, but it is honestly a map of definitions, not a full understanding of the code.

What's now in the stack

codemap build <roots>: walk the trees, (re)build the SQLite index, idempotent.
codemap find <name>: exact or --like, --kind filter, --json, exits non-zero on miss.
codemap file <path>: outline every symbol in one file.
codemap stats: index size and composition by kind.
22 unit tests, dependency-free, plus a reproducible bench.py.