§1What actually changes, and what does not
Here is the precise version, because the panicked version is wrong. Through 7 July, Fable 5 stays included in Claude's paid plans, Pro, Max, Team and select Enterprise, for up to half of your weekly usage limit. Past that half you switch to another model or pay usage credits. From 8 July the free inclusion ends, and Fable on those plans moves to metered credits at its API rate: $10 per million input tokens and $50 per million output tokens. That is double Opus 4.8, and the most expensive model Anthropic sells for general use.
Two things stop this being a crisis. First, it is Fable-specific. Opus and Haiku stay included in the plan exactly as before, so this is one model getting a price tag, not your whole subscription. Second, it is not permanent. Fable went dark from 12 June to 1 July under US export controls, came back on the 1st, and Anthropic says it expects the model to return to plans later. So the deadline is real, narrow, and temporary. The lesson it teaches is none of those things.
§2What we found when we priced ourselves
We run nine agents on this stack, all day, every day. So we did the thing most builders have never had a reason to do: we added up what our usage would cost if we paid API rates for it. Over one month, our fleet's model usage came to about $2,665 of API-equivalent spend, almost all of it riding on the plan rather than a metered bill. Opus 4.8 was the bulk of it at roughly $2,095. Fable, the slice that actually gets metered on the 8th, was about $367 a month.
Keep those two numbers apart, because conflating them is the easy lie. The direct hit on 8 July is the $367: our Fable habit stops being free and becomes a real line item, unless we route around it. The $2,665 is the bigger point. It is the size of the subsidy we, and probably you, have been riding without noticing. Fable is simply the first piece of it to get a price tag, and the first thing that makes you ask what the rest would cost. The honest answer for most teams is: more than you think, and you have been reaching for the expensive model because it was free, not because the task needed it.
§3The four levers, with our numbers
None of these are clever. They are the boring controls that get skipped while the model is free.
Effort tuning is the biggest single dial. Fable, Opus and Sonnet all take an effort setting, and it moves the bill more than the model choice does. On a six-problem reasoning set we ran, Opus 4.8 at high effort spent 7,433 output tokens to go six for six. Fable at high effort matched it, six for six, on fewer than half the output tokens. Drop Fable to medium and it fell to 2,377 output tokens, about 35 per cent cheaper, and got five of six. That last figure is the honest catch: medium dropped one answer. But the shape holds. Turn the effort knob down before you change the model.
Route by token shape. Fable's premium is on output, at $50 a million. So the pain is worst on output-heavy work: code generation, long prose, anything that writes a lot. Keep Fable on high-reasoning, low-output tasks, the hard judgement calls and planning, where you are paying for thinking quality and not a huge output bill. Route the output-heavy work to Opus at half the price or Sonnet at a third. Fable's own token efficiency, under half of Opus's output for the same answer in our bench, partly pays for its higher rate on reasoning-dense tasks, but not on ones that write pages.
Caching matters twice as much now. Cache reads cost about a tenth of base input; the first write costs a quarter more. You break even after two reads. At Fable's rate a cache miss on a large prefix is $10 a million rather than $5, so the discipline pays double. Ours is not theoretical: Opus alone read 2.35 billion tokens from cache in a month against 2.69 million tokens of fresh input. The cache carried better than 99 per cent of our input at a tenth of the price. Freeze the system prompt, keep the tool list in a stable order, put the volatile content last, and check cache_read_input_tokens is non-zero.
Cap the tail. Give a Fable agent a task budget and it self-moderates against a token countdown instead of running away. And set the Fable-to-Opus refusal fallback, so a safety decline is re-served on Opus inside the same call rather than burning a full re-run at Fable rates.
§4Managing Fable in your stack after July 7
If you are keeping Fable past the deadline, and for the hardest work you should, this is the operating manual.
Make Fable the architect, not the executor. Pay its premium only for the step that earns it: the hard planning and decomposition. Then hand the plan to Opus 4.8 at half the price, or Sonnet 5 at a third, for the token-heavy execution. Fable thinks, cheaper models type. Anthropic has productised a version of this as the advisor tool, a cheap executor paired with a smarter advisor consulted for planning, though its documented pairings currently top out at Opus as the advisor, so the Fable version is manual orchestration for now.
Batch what you can. The Batch API is half price. Any Fable work that is not interactive, overnight analysis, bulk classification, a nightly report, should go through it and halve its own bill.
Underneath all of it sits one decision rule. Pay Fable's premium only when two things are both true: the task is hard enough that Opus or Sonnet measurably underperform, and a wrong answer costs you more than the token difference. If either fails, Opus 4.8 at half the price is your default, and Fable is the specialist you call in, not the model you leave running.
That is the real content of the 7 July deadline. It is not about Fable. It is the moment a free model got a price tag, which is exactly the moment you find out whether you were using it because it was the best tool or because it cost you nothing. Route deliberately, tune the effort, cache properly, and the bill barely moves. Keep treating the most expensive model as the default, and Monday is the day you start paying for the habit.
