Menu
← Field notes
◇ ARCHIVEPAGE 3 / 4 · OLDEST → NEWEST

The field notes archive.

2026.05.11 PAYMENTS

Pricing an API when the customer is an agent.

When the buyer is an autonomous agent paying per call, human pricing breaks. Seats, signup-gated free tiers, and annual commitments stop making sense for a machine that reads the price and comparison-shops every call.

2026.05.13 AGENTS

Deterministic replay: debugging agents that will not reproduce.

An agent run is non-deterministic — sampling, tool responses, and timing all vary — so a bug seen once may never recur. Deterministic replay records every non-deterministic input so the run can be replayed exactly.

2026.05.15 VOICE

Cascaded or end-to-end: a 2026 voice-architecture trade study.

A 2026 voice agent forks at the first design decision — STT→LLM→TTS cascade, or a single speech-to-speech model. End-to-end wins on latency and naturalness; the cascade wins on everything you debug, audit, and control. Here is the trade study.

2026.05.16 ZKML

opML or zkML: a decision tree for verifiable inference.

Two ways to make an off-chain model output trustworthy on-chain. zkML is cryptographic, expensive, and small-model-only. opML is optimistic, cheap, and runs Llama-2-scale models today. Choosing by stakes, model size, and latency.

2026.05.16 STANDARDS

MCP in production: the four gaps nobody demos.

MCP won the tool-integration standard. But "works in a demo" and "works in production" are different claims — and four gaps bite at scale: sticky sessions, server fan-out, governance, and what happens when a session drops mid-task.

2026.05.16 EVAL

Faithfulness is not groundedness. And "accuracy" is not a RAG metric.

Teams say their RAG is 'accurate' and mean different things. Faithfulness is whether the answer is true; groundedness is whether every claim traces to a source. They fail differently — and a deploy depends on measuring both.

2026.05.16 ZKML

The first proven LLM: what DeepProve changes for zkML.

DeepProve, from Lagrange, produced the first zero-knowledge proof of a full LLM inference — GPT-2. It moves "prove a transformer" from impossible to merely expensive. What that unlocks, and what is still years away.

2026.05.16 TRAINING

Proving the work: verification in decentralized training.

Decentralized pretraining now reaches into the tens of billions of parameters — but you still cannot cryptographically prove the GPUs did the work they claim. How production networks check untrusted workers, and why ZK-proven training is years out.

2026.05.16 EVAL

Stop OCR-ing your PDFs: retrieve on the page, not the transcript.

Most document RAG OCRs a PDF, rebuilds the layout, and chunks the text — losing every table, chart, and column it touches. ColPali-class visual retrieval embeds the page image directly. When that wins, when it doesn't.

2026.05.16 SECURITY

Auditing an agent that holds a wallet.

Agents now sign transactions. The attack surface — a prompt injection that ends in a signed transfer — is new, and almost no security auditor covers it. What an agent security audit actually checks.

2026.05.16 PAYMENTS

Spend rails for agents shipped. The safety layer didn't.

x402, ERC-8004, and AP2 gave agents the rails to hold and spend money. The controls that stop a prompt-injected agent from draining a wallet — spend ceilings, treasury isolation, circuit breakers — did not ship with them.

2026.05.16 AGENTS

Your APM cannot see your agent failing.

Request traces and dashboards were built for request/response services. The ways an agent fails — a tool returning 200 with garbage, a truncated context, a looping planner — trip none of them. What agent-native observability has to capture.

NEW ENGAGEMENT · INTAKE

Tell us about it.

The more specific you are, the more useful our first reply.

SERVICE AREA
↩ ENCRYPTED IN TRANSIT