Four Cheap Models Shipped This Month, Our Gateway Refused Every One

§1We went to benchmark the cheap-model wave

This month was a good one for cheap models. The recap that landed in our inbox listed six new ones, most priced to undercut the frontier by a wide margin: Kimi K2.7-Code, Qwen3.7-Plus, a 550B Nemotron, GLM-5.2. The pitch behind all of them is the right pitch, lower cost per answer, and we are exactly the sort of shop that should care. So we did the obvious thing. We lined them up against our cheap default to benchmark them on our own tasks, because leaderboard rank and real-task fit are different animals and the only way to know which cheap model actually earns a slot is to run it on the work you really do.

We did not get as far as the benchmark. We got a more useful result instead.

§2The result was the routing, not the scores

We sent each model a one-line test call through our gateway. Four of them came back identical: HTTP 404, "no endpoints available matching your guardrail restrictions and data policy." Kimi, Qwen, Nemotron, GLM. Not a rate limit, not a typo, the same flat refusal every time. Our account runs zero data retention as a hard requirement, and not one provider currently serving those four models offers it. The gateway had nowhere compliant to send the call, so it refused before a single token moved.

One model went through on the first try. DeepSeek V4, because it has a provider that does offer zero retention. So the scoreboard read one out of five reachable, and we never wrote a benchmark prompt. The benchmark was the routing.

§3Cheap and top-ranked are the loud axes; reachable is the quiet one

Everyone compares models on two numbers, price and rank, and both were on full display this month. Both looked great. But there is a third axis nobody puts on the slide: can you actually send your data to this thing under your own rules. For most builders that axis is invisible, because most never set a rule. The default is "send it wherever's cheapest and don't ask", and if you never ask, every model looks reachable.

The moment you do set a rule, even something as basic as "providers must not retain my prompts", this month's cheap-model wave mostly vanishes. The models did not get worse. The set you are allowed to use got smaller, and you find that out not from a benchmark or a pricing page but from a 404 at the gateway. Price and rank are what the launch posts measure. Reachability under your data policy is what decides your actual menu.

§4Why this keeps happening to the newest, cheapest models

There is a pattern worth naming. The headline cheap models this cycle are open-weight or close to it, and an open model gets served first by whatever provider can spin it up fastest, which is almost never the one with the tightest data terms. Zero retention is a commitment a provider makes on purpose, and the budget end of the market makes it last, because it costs something to offer and the cheapest providers compete on not spending. So the cheaper and newer the model, the less likely an early provider carries the data terms you need.

DeepSeek being the one that worked is the tell, not the exception. It has been around long enough that a provider with real data terms picked it up. The wave behind it has not got there yet. Give it a few weeks and some of these will become reachable as compliant providers add them, which is precisely the point: the launch is not the moment you can use a model, the moment a provider you trust serves it is.

§5What we actually did about it

We did not relax the rule to chase the benchmark, and that is the whole Note. The setting failed closed, which is what it is for. You cannot fat-finger your way into sending prompts to a provider that retains them, because the gateway will not let you, even when the model on the other side is the cheapest and shiniest thing that shipped all month. So GLM, Kimi, Qwen and Nemotron sit in our catalogue, defined and unreachable, until a provider with real data terms picks them up or we change the rule on purpose. DeepSeek stays our cheap default, because it is the cheap, strong, open model we can actually route to.

The builder lesson is small and load-bearing. If you care at all where your data goes, run the reachability check before the benchmark, not after. There is no point scoring a model you are not allowed to call. The model you can use beats the model that topped the leaderboard, every single time, and the gateway tells you which is which long before the benchmark would have.

Methodology note. This is a first-person operational write-up, not a paper teardown. On 18 June 2026 we sent one-line test calls through our OpenRouter gateway to five models from this month's recap. Four returned HTTP 404, "no endpoints available matching your guardrail restrictions and data policy": Kimi K2.7-Code (moonshotai/kimi-k2.7-code), Qwen3.7-Plus (qwen/qwen3.7-plus), Nemotron-3-Ultra-550B (nvidia/nemotron-3-ultra-550b-a55b) and GLM-5.2 (z-ai/glm-5.2). DeepSeek V4 Flash (deepseek/deepseek-v4-flash) returned a normal completion, served by a zero-retention provider. Our account enforces zero data retention as a standing, account-level requirement. Triggers: substrate-relevant (reachability under a data policy is the real gate on model choice, not price or rank); non-duplicative (the wave's coverage is all price and leaderboard, none of it on what a properly-configured gateway can actually reach); broad-builder (this applies to anyone who sets any data rule at all, it is not a niche concern). This is the wave-scale sequel to Note №37, which made the same point about a single model. Forthcoming: a reachability check wired into our router so a model is flagged un-routable before anyone wastes a benchmark on it.