We put 95 ships in Google's new knowledge format and measured no difference

Q: What is Google's Open Knowledge Format (OKF)?

OKF is an open specification from Google Cloud (June 2026, Apache 2.0) that standardises the LLM-wiki pattern: a knowledge base is a directory of markdown files with YAML frontmatter. Exactly one field is required (type); title, description, resource, tags and timestamp are recommended; per-directory index.md files support progressive disclosure. It standardises structure only, not tooling, storage or taxonomy.

Q: Does converting a knowledge base to OKF improve agent retrieval?

Not measurably, if the corpus is already tidy. In a spot check over 95 documents that already had descriptive filenames, YAML frontmatter and one-line descriptions, an agent answered three known-answer questions correctly over both the OKF bundle and the raw directory, in comparable time, reading the same number of files. OKF's value is interoperability between agents and organisations, not retrieval performance within one.

We converted our entire ships library, 95 documents, to Google's Open Knowledge Format, verified the bundle conformant with zero violations, then spot-checked an agent answering questions over the bundle versus the raw directory. Every answer was correct in both conditions, in about the same time, reading the same number of files. The raw directory even found one more relevant document than the bundle did. The interesting part is why nothing changed.

What OKF is

OKF is Google Cloud's June 2026 attempt to standardise the LLM-wiki pattern: a knowledge base as a directory of markdown files with YAML frontmatter, readable by humans without tooling and by agents without SDKs. The spec is deliberately tiny. One field is required (type); title, description, resource, tags and timestamp are recommended; an optional index.md per directory gives an agent progressive disclosure, a listing to read before opening documents. That is essentially the whole standard, and the spec says so proudly: minimally opinionated, no schema registry, no required tooling.

What we built

A converter that turns the machine-readable markdown siblings of every ship on this site into a conformant bundle: one concept per ship, type: Ship, frontmatter filled from what the pages already carry, a month-grouped hierarchy, generated indexes at every level and the okf_version declaration at the root. Plus a conformance checker for the spec's three rules. 95 concepts, 415 KB, zero violations. The whole bundle is on GitHub; point your agent at it and ask questions about two months of our builds.

The spot check

Three questions with known answers, each asked twice in a fresh headless Claude session told to answer only from local files: once with the working directory set to the raw ships folder (95 markdown files mixed in with 95 HTML pages and assets), once inside the OKF bundle. All six runs answered correctly. Total wall-clock: 109 seconds over the raw directory, 103 over the bundle. Identical file-read counts per question. On the one question that required finding every relevant document (our Vera evaluation agent, which spans eight ships), the raw directory found all eight and the bundle surfaced seven. Three questions and one run per condition is a spot check, not a benchmark, but the direction is hard to miss: no measurable difference.

Why nothing changed

Because our corpus was already OKF-shaped before we started. The machine-readable siblings we publish for answer engines already carry YAML frontmatter with a title, a one-line description and a canonical URL, under descriptive kebab-case filenames. The agent's actual retrieval strategy in both conditions was the same: grep. A standard whose job is to make knowledge bases self-describing adds nothing to a corpus that already describes itself. That is not a criticism of OKF. It is evidence the spec codified a convention that tidy corpora were converging on anyway, which is what good minimal standards do.

Where the value actually is

Interoperability, not retrieval. The moment a knowledge base crosses a boundary (handed to a teammate's agent, a client's, a stranger's), the consuming side either knows your conventions or wastes context discovering them. A standard removes that negotiation. Our bundle is now something anyone's agent can ingest without being told how we organise things, and that is a property our raw directory never had, whatever the grep results say. The analogy that holds: MCP did not make any single tool call better; it made every tool reachable. OKF does not make retrieval better; it makes corpora exchangeable.

What's now in the stack

build_bundle.py — ships library → conformant OKF bundle, rerunnable as the library grows.
check_conformance.py — validates any tree against the spec's three conformance rules.
The full 95-ship bundle plus the eval harness and raw results on GitHub, with the write-up in examples/096. Steal the bundle; it is the point of the format.

What's still off

The spot check is three questions and one run per condition; treat it as direction, not measurement. Our corpus being tidy makes this a lower bound: a messy pile of untitled documents would likely gain more from conversion, and we did not test that. And the interoperability argument is currently a promise, because OKF consumers barely exist yet. The spec is three weeks old and version 0.1. We shipped the bundle anyway, since being early to a format costs a script and an afternoon; being late to one costs a migration.

FAQ

What is Google's Open Knowledge Format (OKF)?

An open specification from Google Cloud (June 2026, Apache 2.0) that standardises the LLM-wiki pattern: a knowledge base as a directory of markdown files with YAML frontmatter. Exactly one field is required (type); title, description, resource, tags and timestamp are recommended; per-directory index.md files support progressive disclosure. It standardises structure only, not tooling, storage or taxonomy.

Does converting a knowledge base to OKF improve agent retrieval?

Not measurably, if the corpus is already tidy. In our spot check over 95 documents that already had descriptive filenames, frontmatter and one-line descriptions, an agent answered every question correctly over both the bundle and the raw directory, in comparable time. OKF's value is interoperability between agents and organisations, not retrieval performance within one.

When is adopting OKF worth it?

When a knowledge base has to cross a boundary: shared with a team, handed to a client's agent, or published for anyone's second brain to ingest. Within a single system that already keeps consistent frontmatter and filenames, conversion is cheap but buys little today.