Notes on agent budgets: why "let it think longer" is a bug.
An agent that hits a wall and asks for more compute is not reasoning. It is panicking. The budget is part of the spec, not a fallback.

An agent that hits a wall and asks for more compute is not reasoning. It is panicking. The budget is part of the spec, not a fallback.

A pattern shows up in maybe a third of the agent demos I’m forwarded. The agent tries a tool. It fails. The agent tries a slightly different tool. That fails too. The agent tries the first tool again with different arguments. That also fails. The agent then “reflects” — emits a paragraph about how it should think more carefully — and tries another permutation of the same tool. At some point the wrapper times out and the demo cuts.
The team running the demo will tell you the agent is “still learning.” It is not. There is no learning happening. The agent is in a loop, the wrapper is hiding it, and the system has no notion of when to stop.
This is not an agent problem. This is a budget problem. The agent has no budget. It does not know what a budget is. The system was designed under the assumption that more thinking is always better, which is false.
A well-built agent is scoped by three budgets, declared before any reasoning starts. Every step the agent takes is checked against all three.
If any one of these exceeds its budget, the agent is required to halt and produce a structured partial result — not retry, not reflect, not “let it think longer.”
These three numbers are determined by the use case, not by the model. A real-time dispatch agent has a 4-second latency budget and a 20-cent cost budget. A nightly research assistant has a 2-hour latency budget and a $4 cost budget. The same model serves both, with different orchestration.
If you cannot tell me your three budgets, you do not have an agent. You have a chatbot that calls functions.
The temptation to extend the budget when an agent struggles is strong. Almost every team building agents this way reaches for it. The reasoning sounds rational: the model is almost there; one more step and it will figure it out. Let it think longer.
It will not figure it out. Across thousands of agent traces we’ve reviewed, the pattern is overwhelmingly: when an agent fails to make progress in the first three or four tool calls, additional calls do not improve the outcome. The shape of the trace deteriorates. The agent tries the same thing in different wrappings. It calls a tool, gets a 400, calls the tool again with the same arguments, gets the same 400, “reflects,” and concludes that the tool is broken.
Models do not autonomously generate better hypotheses by being granted more compute. They generate more hypotheses, drawn from the same distribution. That is sampling, not reasoning. If the gold path is not in that distribution, you get more wrong answers at higher cost.
The fix is not to think longer. The fix is to (a) escalate to a human, (b) hand off to a specialized sub-agent with different tools, or (c) emit a structured failure with a partial result. All three are legitimate. “Try again” is not.
Here is the loop we build, roughly:
async def run(task, budgets):
state = init_state(task)
spent = {"cost": 0.0, "ms": 0, "calls": 0}
deadline = time.time() + budgets["ms"] / 1000
while not state.done:
if (spent["cost"] >= budgets["cost"]
or time.time() >= deadline
or spent["calls"] >= budgets["calls"]):
return emit_partial(state, spent, reason="budget")
step = await plan_one_step(state, remaining=remaining(spent, budgets))
result, cost, ms = await execute_step(step)
spent["cost"] += cost
spent["ms"] += ms
spent["calls"] += 1
state = update(state, step, result)
return emit_final(state, spent)Two things worth noticing.
First, remaining(spent, budgets) is passed into the planner. The planner sees how much budget is left. The agent’s plan is conditioned on the budget. A planner with 60% budget remaining picks a different next step than the same planner with 5% remaining. This isn’t a clever trick; it’s just acknowledging that the planner cannot do its job without knowing what resources it has.
Second, when the budget is exhausted, the agent does not retry. It emits a partial result: what it found, what it tried, why it stopped. The downstream consumer — a human or an orchestration layer — gets to decide what to do.
The single most underspecified part of nearly every agent system in production is what happens when things don’t go well.
Most agents in production emit one of two outputs: a happy-path JSON object on success, and an apology paragraph on failure. The apology paragraph is useless. Nobody downstream can parse it. The orchestration layer can’t escalate gracefully. The end user gets a wall of “I’m sorry, I was unable to…” text and decides the system is broken.
What we build instead is a structured failure schema that mirrors the success schema. Something like:
{
"status": "partial",
"reason": "budget_exhausted",
"spent": { "cost": 0.18, "ms": 2840, "calls": 6 },
"best_effort": { ... whatever the agent did manage to determine ... },
"next_actions": [
{ "type": "human_review", "what": "Confirm the carrier code for shipment 0418-Q." },
{ "type": "retry_with", "tool": "tms.search", "args": {...} }
]
}Three properties of this schema matter:
status is machine-readable. Downstream code branches on it.best_effort is not optional. Even when the agent could not finish, it produces what it has. Half a route is a useful starting point for a dispatcher. Half a research summary is a useful starting point for a PM.next_actions are concrete enough to execute. Not “consider escalating”; “call human_review with this argument.”Agents that emit this contract integrate cleanly. Agents that emit apology paragraphs do not.
Sometimes the budget is the problem. The task genuinely cannot be done inside it. Two responses are correct:
What is not correct: raise the budget and hope. If you raised the budget once, you’ll raise it again. Eventually the agent is allowed to spend an hour and twenty dollars on a single dispatch decision and your unit economics are gone.
When reviewing an agent in production, run a quick check on the last 200 traces. Four signals:
None of these are sophisticated. All of them are skipped routinely.
Budgets are part of the spec. They are not a fallback. They are not an emergency brake. They are the constraint under which the agent is designed.
“Let it think longer” sounds like generosity. In practice it is a way to defer the decision about what the agent is supposed to do when it can’t make progress. The decision is unavoidable. You either make it on purpose, at design time, or you make it accidentally, at runtime, by letting the agent burn money in a loop.
The agents we build don’t think longer. They think shorter, and when they hit a wall, they say so.

A one-word change to a system prompt can move accuracy by dozens of points, and a provider's model update can regress your app overnight. A prompt or model swap is a deploy. Give it a staged rollout and a one-action rollback path.
11 min →
The monthly inference bill arrives as one number, and nobody can say which agent, which customer, or which tool spent it. Agent cost is too variable to estimate and has to be attributed after the fact — per run, per tool, per tenant. The layer most stacks skip.
11 min →
An agent that asks permission for everything trains its reviewers to rubber-stamp, and the one dangerous action slips through in the noise. Approval gates belong on consequence and on uncertainty — not on every step. Where to put them.
12 min →