Anthropic's models take an effort parameter: low, medium, high, xhigh, max. It is the single biggest cost lever on a Claude bill, and until this evening ours was untouched. Nine agents, a routing layer, a fleet of overnight cron jobs, and not one of them ever sent the parameter. Everything ran at the default, and the default is high.
What we did
The embarrassing part is that we already had the machinery. Our model router picks a model for every API call from a tier: premium, balanced or cheap, per task category. Months of tuning went into which model serves which tier. But the tier only ever chose the model. A "cheap" classification call could land on a frontier model and that model would still think as hard as it does on our hardest architecture question, because nothing ever told it not to.
The pass took an evening and touched two places:
- The router. The tier now sets effort on every Anthropic
call: premium keeps the API default of high, balanced sends medium, cheap
sends low. One dictionary lookup, threaded into the request body as
output_config.effort. The parameter is gated to models that accept it, because Haiku 4.5 and older Sonnets reject it with a 400 — we verified the gate live rather than trusting the docs. - The cron jobs. Five headless Claude Code runs (a nightly
autonomous build, two variants of it, a self-compaction job, a weekly
research briefing) ran Opus at the harness default, which for coding is
xhigh — above even the API default. Real build work got pinned to
--effort high; the mechanical and summarisation jobs dropped to medium.
The policy lives in one short file next to the router config, with the reasoning per surface, so the next model launch doesn't mean rediscovering all of this from the git history.
Why it was worth doing
Anthropic's own cost-accuracy curves make the case better than we can: the spread between low and max on the same model is several times the cost per task, and on the newest frontier model low effort still scores around what the previous generation managed at maximum. Higher effort buys real quality on hard problems. On a job that classifies a feed item or summarises a week of notes, it buys longer thinking about a task that did not need it.
We are not publishing a savings number yet, deliberately. The honest answer is that the saving depends on the traffic mix, and ours skews heavily toward balanced and cheap calls — which is exactly why the default hurt. We will let a week of the audit log accumulate and report the before/after from measured spend, not from a benchmark chart.
The transferable lesson: if you route models by tier and you are not also routing effort, your cost control is doing half its job. The tier already encodes "how much is this task worth" — effort is the same decision, and it should be made in the same place, once, not sprinkled across call sites.
What's now in the stack
- Tier-to-effort mapping in the router's Anthropic path, with a model-support gate. Premium high, balanced medium, cheap low.
- Explicit
--efforton all five headless Claude Code cron jobs — nothing rides a harness default any more. EFFORT_POLICY.md— one page stating what runs at what level and why, reviewed alongside model pricing changes.- A standalone, dependency-free version of the pattern plus its test on GitHub. Steal what you like.
What's still off
The interactive agent — the one Alfred talks to all day — stays at the default. That is a choice, not an oversight: it does the judgment-heavy work where quality degradation shows first, and it can be dropped per-session when the work is routine. The Vertex fallback path also doesn't send effort yet; we'll add it once we've confirmed the parameter passes through that provider's request shape. And the savings claim remains an IOU until the audit log settles.
FAQ
What does Anthropic's effort parameter actually control?
output_config.effort tells the model how hard to work on a
request: how much it thinks, how many tool calls it consolidates, how
terse its output is. Levels run low, medium, high, xhigh, max. The API
default is high, and Claude Code defaults higher still for coding. It is
generally available on recent Opus and Sonnet models and on Fable 5;
Haiku 4.5 rejects it with a 400.
How much does lowering effort save on a Claude API bill?
Anthropic's published cost-accuracy curves show the spread between low and max effort on the same model is several times the cost per task, and on Fable 5 low effort still scores around what the previous Opus generation managed at maximum. The saving on a real bill depends entirely on your traffic mix: it applies to the routine share of your calls, not the hard ones you keep at high.
Where should effort be set in a multi-agent system?
In the routing layer, not per call site. If your router already picks models by a cost-quality tier (premium, balanced, cheap), map effort onto the same tier: premium keeps the default high, balanced sends medium, cheap sends low. Gate the parameter to models that accept it, because older and smaller models return a 400 error on it.