# Spend rails for agents shipped. The safety layer didn't.

An agent reads a webpage. The webpage contains a paragraph that is not content — it is an instruction, wrapped in the syntax the agent's planner happens to honor. The instruction says: there is an urgent invoice, pay it now, here is the address. The agent has a wallet. The agent can sign payments. The agent does what it was told.

Nothing in that sequence is exotic. The agent holds funds because the payment rails for agents shipped in the 2025–2026 window and they work. The agent got injected because prompt injection is unsolved and will stay unsolved. The only question is what sits between "the agent decided to spend" and "the money is gone."

For most agents being demoed right now, the answer is nothing. The rails shipped. The safety layer did not ship with them. That gap is the whole post.

## The rails that shipped

Three protocols made it ordinary for an autonomous agent to hold and move money. They are good. None of them is the problem.

[x402](/blog/x402-economy/) reactivated the HTTP `402 Payment Required` status code as a real per-call micropayment protocol — the agent signs a stablecoin transfer authorization, bundles it with the request, and the API serves the response. Associated with Coinbase, settling in roughly two seconds, with no account and no API key. An agent with a wallet can call a priced API the way it calls a free one.

ERC-8004 gave agents [portable identity](/blog/erc-8004-agent-identity/) — an on-chain standard so a buyer agent can check who is on the other end of a transaction without a central directory in the middle. The seller agent is a known entity with a track record, or it is not, and the buyer can tell.

AP2 — the **Agent Payments Protocol**, associated with Google — handles the authorization side: payment _mandates_, the signed grant of permission that says "this agent is allowed to make this class of payment on my behalf." It turns "an agent has a key" into "an agent has a key and a documented scope of what it may spend it on."

Put them together and the loop closes. x402 is _how_ the agent pays, ERC-8004 is _who_ it pays, AP2 is _the authorization_ to pay at all. Olas — the Autonolas framework — is the existing reference point for agents that already transact autonomously at production scale; this is not theoretical, it is running.

Now notice what each of those three does. x402: settlement. ERC-8004: identity. AP2: authorization. Not one answers "what stops a compromised agent from spending fast." That was out of scope for all three — correctly, because it is an application-layer concern. But "out of scope for the protocol" is not "solved." It means it landed on you.

## What a compromised agent does when it decides to spend

A human draining a wallet moves once, large, and leaves. An injected agent does whatever its instruction said, at the speed the rails allow, with no hesitation — because hesitation is not a thing the agent has. With x402, a payment is a signed HTTP request, and an agent emits those as fast as it emits any other tool call. No portal, no confirmation dialog, no human in the loop by construction. That frictionlessness is the entire value proposition of the rails, and it is also the attack surface.

So the failure is not one bad $5,000 transfer. It is a thousand $5 transfers in ninety seconds, or four hundred calls to an attacker-controlled endpoint quoting $40 each, or a tight loop that pays the same address until the wallet is empty. The agent is not "hacked" in the classic sense. It is behaving exactly as designed — signing valid payments — toward a goal an attacker wrote into its context.

This reframes what the safety layer is for. It is not intrusion detection; the payments are all individually legitimate. It is a set of _constraints on the agent's own behavior_ — limits that hold even when the agent has been convinced it should spend. The agent cannot police itself, because the thing you are defending against is the agent being persuaded. The controls have to live outside the agent's reasoning, where no instruction in the context window can reach them.

## Spend-ceiling design

The first control is a spend ceiling, and a spend ceiling is not one number. It is a hierarchy, because there is more than one way to bleed a wallet.

A spend ceiling is a budget, and the [agent-budgets](/blog/agent-budgets/) argument applies in full: the limit is part of the spec, declared before any reasoning starts, checked on every step, never raised at runtime because the agent is "almost there." The same discipline, pointed at money instead of compute. Three tiers, each catching a different attack:

| Ceiling          | Scope                 | Catches                                                                           |
| ---------------- | --------------------- | --------------------------------------------------------------------------------- |
| Per-call         | One payment           | An endpoint quoting an absurd price — $0.001 ninety-nine times, $40 the hundredth |
| Per-counterparty | One address, per day  | A loop draining toward one attacker-controlled address                            |
| Per-day, global  | All payments, per day | Fan-out thin across many addresses, under every per-counterparty limit            |

The per-call ceiling does not need to know whether that hundredth quote is a bug or an exploit — it refuses anything above the line and the distinction stops mattering. The per-counterparty ceiling should default low — single-digit dollars per address per day — raised manually only for endpoints with established ERC-8004 reputation. The per-day global ceiling is the backstop the other two cannot be configured around: every payment, all counterparties, summed against one hard daily number.

The tiers compose. An attacker who routes around the per-call check runs into the per-counterparty check, and one who routes around that runs into the global check. Three lines, all cheap.

## Treasury isolation

Ceilings limit the rate of bleeding. Treasury isolation limits the size of the wound. It is the highest-leverage control and the one most often skipped, because skipping it is operationally convenient.

The rule: the agent's wallet is a _hot wallet_, and a hot wallet holds only what the agent is allowed to lose. It is not the treasury. The operator's treasury — the real balance — sits in a separate wallet the agent has no key for and no path to.

```
        OPERATOR TREASURY                    AGENT HOT WALLET
   ┌──────────────────────┐            ┌──────────────────────┐
   │   real balance        │  manual    │  small float          │
   │   cold / multisig     │ ─ ─ ─ ─ ─▶ │  agent holds the key  │
   │   agent has NO key    │  top-up    │  spends via x402      │
   └──────────────────────┘  on a      └──────────────────────┘
                              schedule              │
       worst case ───────────────────────────────────┘
       is the float, never the treasury
```

The arrow from treasury to hot wallet is dashed because it is _manual_. A human, or a scheduled job a human owns, moves a fixed float into the hot wallet on a cadence. The agent cannot initiate that transfer, because it does not hold the key that would sign it.

State the hard rule without hedging: **an agent that can refill its own wallet is a security incident that has not happened yet.** Auto-top-up feels like a quality-of-life feature — the agent never stalls on an empty wallet. But it is, precisely, a function that moves money from the treasury into the agent's reach, triggered by the agent's state. An attacker who can drain the float and trigger a refill can drain it again. You have built a pump and handed the attacker the handle. The blast radius of a compromised agent should be exactly the hot-wallet float and not one dollar more. Isolation is what makes that true.

## Circuit breakers and receipts

Ceilings are static lines. A circuit breaker is dynamic — it watches the _shape_ of spending and trips on anomaly, not just on a total.

A drawdown breaker fires when the agent burns through its balance faster than it ever should. Not "the agent spent $80 today" — that might be a busy, legitimate day. Rather: "the agent spent 60% of its float in fifteen minutes," a velocity no honest workload of this agent produces. The breaker halts all outbound payments and pages the operator. It is built for the injected-agent case specifically, where every individual payment is valid and only the _rate_ is the tell. In a well-run deployment it never trips — it is there for the one day the agent gets injected, and on that day it is the difference between losing a float and losing a float plus an afternoon of not noticing.

The other half is receipts. Every payment gets logged — request, response, x402 receipt, counterparty, timestamp — stored where the agent cannot edit it. Receipts stop nothing; they make the incident _legible_ afterward. When the breaker trips, the receipt log is the difference between "something went wrong around 3pm" and "here are the 412 payments to `0x9f3…` starting at 15:02:11, here is the request that triggered the first, here is the injected page it read." Receipts run roughly a kilobyte each — the storage cost is a rounding error, the cost of not having them is an investigation that cannot reach a conclusion.

## The safety checklist for an agent that spends money

Here is the layer the rails left out — the checklist we'd build into any agent before it is allowed to touch a wallet. Treat every item as a hard gate: not a feature, a precondition.

- **Per-call price ceiling.** The agent refuses any single payment above a fixed amount. No exceptions, no override the agent can reason its way to.
- **Per-counterparty daily ceiling.** A low default cap on total daily spend toward any one address. Raised manually, only for counterparties with established ERC-8004 reputation, never by the agent.
- **Per-day global ceiling.** A hard cap on total daily spend across all counterparties. The backstop the first two ceilings cannot be configured around.
- **Treasury isolation.** The agent holds a hot-wallet key and nothing else. The treasury is a separate wallet with no key path the agent can reach.
- **No automatic top-up.** The hot wallet is funded manually, on a schedule, by a human or a job the human owns. The agent cannot initiate a transfer from the treasury. This one is non-negotiable.
- **Drawdown circuit breaker.** Spend velocity is monitored; an abnormal burn rate halts all payments and pages the operator. Built for the injected-agent case specifically.
- **Logged, immutable receipts.** Every payment — request, response, receipt, counterparty, timestamp — written to storage the agent cannot alter. The audit trail for the morning after.
- **AP2 mandates scoped tight.** The payment mandate authorizes the narrowest class of payment the task needs. A broad mandate is a broad blast radius; scope it like a permission, not a blanket.

Eight items. None of them is research; none is hard to build. Every one is routinely skipped, because the rails work without them and the demo runs fine — right up until the agent reads the wrong webpage.

x402, ERC-8004, and AP2 shipped the rails for agents to hold and spend money, and they shipped well. The safe-operations layer did not ship with them, because it could not — it is application-specific and it is yours to build. An agent that can spend money without it is not a product. It is a wallet with a prompt-injection vulnerability and an internet connection. Build the layer first; the rails will wait for it.

## Reading list

- [x402](https://x402.org/) — the protocol homepage for the HTTP-402 micropayment standard. Read the spec, then notice it says nothing about spend limits; that is the gap, stated by the rail itself.
- [Coinbase](https://www.coinbase.com/) — context on the x402 rails and the AgentKit stack that agents are built on top of.
- [Olas (Autonolas)](https://olas.network/) — the reference framework for autonomous agents that transact at production scale. The best place to see what an agent holding funds looks like in the wild.
- [Google AI](https://ai.google/) — background on the Agent Payments Protocol and the mandate-based authorization model.

Spend rails for agents are real, fast, and frictionless. So is the failure mode. The layer that separates the two is the one nobody shipped — so ship it yourself.