§1The model we actually wanted
Moonshot shipped Kimi K2.7 Code this month and put the full weights on Hugging Face under a Modified MIT licence on day one, with vLLM and SGLang support in the box. It is a mixture-of-experts model: roughly a trillion total parameters, about thirty-two billion active per token, two hundred and fifty-six thousand tokens of context. It is not a general chatbot dressed up for the demo. It is shaped for the work an engineer actually does, reading an unfamiliar repository, planning a change across several files, running the tools and tests, reading the output, and correcting course when something breaks. Moonshot reports it beats its own previous version by nearly twenty-two percent on their coding benchmark while burning about thirty percent fewer reasoning tokens.
That is the exact profile we would want as an open fallback for Bob, the agent that runs most of our build work. So we did the obvious thing. We added it to our model router and pointed a real call at it.
The router never reached it. The gateway returned a flat 404 — No endpoints available matching your guardrail restrictions and data policy. No call was made. No tokens were spent. A trillion-parameter model with its weights sitting open on the internet could not be dialled from our stack at all.
§2What refused it, and how we know it was the policy
We run zero-data-retention as a standing, account-level setting on the gateway, not a per-call flag someone remembers to tick. Every provider currently serving Kimi K2.7 Code does so without a zero-data-retention guarantee. So the gateway had nowhere compliant to send the request and it failed closed: it refused the route rather than quietly finding a provider that would take the traffic on weaker terms.
We checked it was the data policy and not a dead key or a flaky endpoint, because that distinction is the whole Note. On the same key, in the same minute, a call to one of our default open models returned a clean 200. Every Moonshot endpoint we tried, four of them, returned the same 404. The key works. The account works. The model is simply on the wrong side of a line we drew ourselves.
§3Why this one stings
Four days ago we published a Note about a different refused model, GLM-5.2, and the honest read on that one was that the guardrail had cost us nothing. That model was priced like Haiku and its only claim was a one-shot frontend leaderboard, so there was no real decision to lose. The refusal looked clean partly because the temptation was not real.
This is the counterweight. Kimi K2.7 Code is priced at roughly sixty-one cents per million tokens in and three dollars seven in cents out. The model running Bob today costs five dollars and twenty-five. That is not a rounding difference, it is about eight times cheaper, on a model purpose-built for exactly the job we use the expensive one for. There is a genuine case here, on cost and on fit, and we could not run a single line of it. The guardrail did not save us from a shiny distraction this time. It blocked something we would plausibly have shipped. The cost of the policy stopped being hypothetical.
§4Open weights gave us two doors, and both were locked
Here is the part worth keeping. The whole promise of open weights is that you are not at a vendor's mercy. You get two independent routes to using the model: rent a hosted endpoint that meets your data terms, or download the weights and run them on hardware you control. Sovereignty, in theory, by either path.
For a one-person shop holding a real data line, both doors were shut on the same afternoon. The hosted route is the 404 above: no serving provider meets our retention policy, so the gateway refuses. The self-host route is arithmetic: a trillion-parameter model, even at thirty-two billion active, needs a rack we do not own. Our sovereign hardware tops out at a seven-billion-parameter model on a home GPU. The licence could not be more permissive and the weights could not be more downloadable, and it still changes nothing, because "open" answers a legal question, not an operational one. The thing you can legally have is not the same as the thing you can actually reach.
Open weights is a permission, not a delivery. It tells you that you are allowed to run the model. It does not tell you that you can, on the hardware you have, through the data policy you keep.
§5The honest trade
It would be easy to dress this up as a clean win, the way the first Note read. It is not, quite. We blocked a model we had a real reason to use, and we did it to ourselves. The honest framing is that this is a cost we are choosing to pay, with eyes open, because the alternative is worse: a stack that quietly relaxes its data posture whenever a good-enough model appears behind a non-compliant endpoint. We would rather see the 404 and feel the loss than never know the line had moved.
And there are real ways out, none of them free. We could vet a direct Moonshot key and route around the gateway, the way we already route one US model directly rather than through a privacy guardrail, which costs us a fresh due-diligence pass on a new processor. We could wait for a provider with a real retention agreement to serve these weights, which costs us time and may never come. Or we could stay where we are, on a model that is eight times the price and reachable, which costs us money on every call. Two refused models in four days is not bad luck, it is the shape of the trade: the open-model release cycle and a strict data line are diverging, and the gap is now wide enough to have a price tag on it.
We have left Kimi K2.7 Code in the catalogue, gated and documented, unreachable until either the policy or a compliant endpoint changes. The router did its job again. The difference is that this week, doing its job actually cost us something, and pretending otherwise would be the easy lie.
