# opML or zkML: a decision tree for verifiable inference.

A protocol wants to put a model on-chain — a credit scorer governance needs to audit, an LLM agent a treasury acts on, a Stable Diffusion endpoint minting NFTs. The model is too big to run inside a smart contract, so it executes off-chain on a normal GPU, and the contract receives an output it cannot recompute. The whole question is: why should the chain believe that output?

There are two production-ready answers in 2026, and they are not variations on a theme. _zkML_ — zero-knowledge proofs of ML inference — attaches a cryptographic proof that the model was evaluated correctly. _opML_ — optimistic ML — attaches an economic bond and a challenge window, and lets anyone dispute the result by re-running it. One buys finality with mathematics; the other with game theory. They get conflated by people pitching "verifiable AI," and they should not be: different trust models, different cost curves, and — the part that decides most real builds — radically different ceilings on model size. This essay is the decision tree, sibling to our [FHE vs TEE post](/blog/fhe-vs-tee-for-ml/) — same problem family, same decide-by-constraints shape.

## The 30-second summary

**zkML**: the off-chain prover runs the model and emits a succinct cryptographic proof — a SNARK or STARK — that the named model, on the named input, produced the named output. The contract verifies the proof; if it passes, the output is correct.

Trust model: mathematics. As long as the proof system is sound, a wrong output cannot produce a passing proof — nothing to wait for, no one to trust. Cost: proving is orders of magnitude slower than native inference — minutes per inference for a small model on a GPU, and not practical at all for anything large and transformer-shaped.

**opML**: the off-chain executor posts the output along with an economic bond, and a challenge window opens. During that window any watcher can re-execute the model and, if the result disagrees, submit a fraud proof — which slashes the bond and corrects the result. Unchallenged, the output finalizes.

Trust model: game theory. The output is assumed correct because lying is unprofitable — a lie is detectable by anyone willing to re-run the model, and detection costs the liar their bond. This is the optimistic-rollup security model applied to inference. Cost: cheap — the executor runs the model once, at roughly native speed, no proof. The price is not compute, it's _time_: you wait out the window before the output is final.

That summary already tells you most of the answer. **If the model is small and the stakes are high enough that you cannot tolerate a challenge window, zkML. If the model is large — Llama-2-scale, Stable Diffusion — opML, because zkML simply cannot run it.** The rest of this post is what the summary doesn't cover.

## The two trust models, in detail

The cryptographic-vs-optimistic distinction is the load-bearing part of this post — take it as seriously as the math-vs-hardware split in the FHE post.

A **zkML** proof binds a commitment to the model, the public input, and the public output into one unforgeable assertion that _this model_ evaluated _this input_ to _this output_. The payoff is **finality at verification**: the output is correct the instant the transaction lands and stays correct forever — the proof does not get weaker because the attacker got richer. What you trust is the soundness of the proof system and its implementation — the circuit, the SNARK/STARK, the verifier contract. A real surface, which is why audited proving stacks matter — but a _fixed_ one that does not scale with the value at risk.

**opML** inverts the burden — nobody proves the output up front — and its economic security argument has three legs:

- **Re-execution is cheap.** A challenger re-runs the model once, on a normal GPU. opML's fraud proofs typically bisect the disputed computation down to a single step the chain can adjudicate, so the on-chain part of a dispute stays small even when the model is large. Anyone with a GPU can watch.
- **Lying is bonded.** A successful fraud proof slashes the executor's stake and rewards the honest challenger from it. Dishonesty has a price; honesty pays.
- **One honest watcher is enough.** The output is safe as long as _at least one_ honest party is re-executing and willing to challenge — the same 1-of-N liveness assumption that underpins optimistic rollups.

The payoff is running the model **at native speed** — the executor does no cryptographic work, the entire reason opML can handle models zkML cannot touch. What you trust is not mathematics but economic parameters: an attentive watcher, a window that outlasts chain congestion, a bond that makes lying irrational. Getting those right is the work of an opML deployment.

```
zkML                                opML
────────────────────────────       ────────────────────────────
run model + generate proof          run model (native speed)
            │                                   │
            ▼                                   ▼
   submit proof + output            submit output + post bond
            │                                   │
            ▼                                   ▼
   verify proof on-chain             challenge window opens
            │                                   │
            ▼                          ┌────────┴────────┐
   FINAL (cryptographic,          challenged?       unchallenged
   instant, unconditional)             │                 │
                                       ▼                 ▼
                              fraud proof settles    FINAL when
                              dispute, slash bond    window closes
```

## zkML's cost and where the ceiling is

We benchmarked the zkML side in depth in [which zkML ships](/blog/which-zkml-ships/); the finding that matters here is the model-shape constraint. zkML is shippable today, but only for a model that is **fixed, well-characterized, and small.** A few-million-parameter MLP — the shape of a DeFi credit-risk model — proves in seconds to tens of seconds on a single GPU and verifies on an L2 for cents, with a single auditable verifier contract. For that model zkML is a deployable primitive, and [EZKL](https://github.com/zkonduit/ezkl) is the stack we reach for.

The ceiling is the problem. zkML's cost scales with the number of operations in the circuit, and that cost is brutal. Transformer attention layers are roughly an order of magnitude harder to prove than dense layers of the same size. Proving cost grows with parameter count: tens of millions is slow but possible, hundreds of millions pushes proving time from seconds toward hours per inference. A general-purpose zkVM ([RISC Zero](https://www.risczero.com/) is the reference) buys flexibility — write Rust, prove the execution trace — but pays substantial overhead for it.

The frontier did move: there now exists a full zero-knowledge proof of a small-LLM (GPT-2-scale) inference. But "GPT-2-scale, slow and expensive" is the frontier, not the routine case. If your use case needs a Llama-2-scale model, Stable Diffusion, or any transformer doing genuine reasoning, zkML cannot run it in 2026. Not "is slow." Cannot — the sentence that sends large-model builds to opML.

## opML in production: what Ora runs, and what to tune

In exchange for cryptographic finality, opML removes the model-size ceiling almost entirely. [Ora](https://www.ora.io/) — formerly Hyper Oracle — is the pioneer and runs opML in production today. The thing worth internalizing: **Ora's opML supports Llama-2-scale large language models and Stable Diffusion.** That is not a projection — it is a class of model flatly outside what zkML can prove, running on-chain-verifiable right now.

The design parameters an opML deployment has to get right:

- **The challenge window.** Long enough that an honest watcher can detect a bad output and land a fraud-proof transaction even under chain congestion. Too short and the security argument breaks; too long and finality drags. It trades directly against latency.
- **The bond.** The executor's stake has to exceed the value an attacker could extract from one false output finalizing — a bond sized for a low-value endpoint is not one sized for an endpoint that moves a treasury.
- **The watcher set.** Security rests on at least one honest, attentive challenger. For a high-value deployment you run your own watcher; assuming "someone will watch" without funding one is an unfinished design.
- **The fraud-proof path.** The dispute mechanism has to be exercised and tested, not just specified — an untested fraud-proof path is a security assumption, not a security mechanism.

opML's honest weakness is the window: between submission and finalization the output is _probabilistically_ trustworthy, not _certainly_ so — fine when you can wait, a dealbreaker when an answer must be unconditionally final the instant it lands.

## The decision tree: stakes, size, latency, finality

Four axes decide this, and they line up:

| Axis              | Pushes toward zkML               | Pushes toward opML                      |
| ----------------- | -------------------------------- | --------------------------------------- |
| Model size        | Small, fixed, well-characterized | Large — Llama-2-scale, Stable Diffusion |
| Stakes / value    | High — large TVL exposed         | Modest — bondable risk                  |
| Latency tolerance | Need finality immediately        | Can wait out a challenge window         |
| Inference volume  | Low volume, small model          | High volume                             |
| Trust model       | Cryptographic, unconditional     | Economic, 1-of-N honest watcher         |

_Model size_ overrides the others, so check it first — a large model puts zkML off the table in 2026 no matter what the rest say, and it is frequently the only axis that gets a vote. The other three matter _within_ the small-model regime. _Stakes_ pull on opML because its safety is economic: as the value at risk climbs, the bond needed to deter an attack climbs with it, until a large-enough bond is impractical to lock — whereas zkML's safety does not move with stakes at all. _Latency_ is opML's weak axis, since the challenge window delays finality. _Volume_ cuts the other way: zkML's proving cost is paid on every inference, so high volume on a small model favors opML.

## The hybrid: opML by default, a ZK proof on demand

The most interesting pattern is not picking one. It is opML as the default path, with a ZK proof on demand for the inferences that warrant it. Every inference runs through opML — fast, cheap, native-speed, bonded — and most outputs ride the optimistic path to finality. But certain inferences are flagged high-stakes — a parameter update governance must ratify, an output that moves a large position — and for those the protocol _demands_ a ZK proof before settlement. That single inference pays the zkML cost; the thousands around it do not. This is not hypothetical: Ora ships exactly this — optimistic by default, with the option to demand a ZK proof for high-stakes inference. Size and stakes are properties of each _request_, not of the whole deployment, so the hybrid routes per request instead of per system.

One constraint keeps it honest: the on-demand ZK path is still bounded by zkML's ceiling. If the high-stakes path runs a small, well-characterized sub-model — a final risk score, a compact decision head — the hybrid is clean. If it needs a proof of the _full_ Llama-2-scale model, the hybrid does not save you, because that proof cannot be produced — so design the high-stakes path around something provable. Done well, the hybrid gets you opML's economics on the bulk of traffic and zkML's finality where the stakes justify it: for a protocol with a wide spread of inference values, the design we'd reach for.

## Practical advice for a team picking one

Walk this top to bottom; stop at the first match.

1. **Is your model large — Llama-2-scale, Stable Diffusion, or a sizable reasoning transformer?** Yes → opML. zkML cannot prove it in 2026; this axis overrides everything below. No → continue.
2. **Does the output need to be final immediately — acted on within the same block or the next few, with no challenge window?** Yes → zkML. No → continue.
3. **Does a single wrong output control more value than you can practically cover with a posted, locked bond?** Yes → zkML, for the high-stakes paths at minimum. Sometimes-no → continue.
4. **Do your inference values vary widely — most low-stakes, a few high-stakes?** Yes → hybrid: opML by default, demand a ZK proof on the high-stakes paths (and make sure those paths run a provable sub-model). No → continue.
5. **Is your model small and fixed, your volume low, and cryptographic auditability a hard requirement?** Yes → zkML. Otherwise → opML.

When in genuine doubt, default to opML and reserve zkML for the paths whose stakes you can name precisely — the same instinct as defaulting to a TEE in the [FHE vs TEE](/blog/fhe-vs-tee-for-ml/) decision: the cryptographically pure option is real, but expensive, and most workloads do not need it everywhere. Verifiable inference is not a feature you bolt on — it is a trust-model decision that runs through the whole architecture, and the right answer depends entirely on how big the model is, how much the output is worth, and how long you can wait.

## Reading list

- [Ora](https://www.ora.io/) — the opML pioneer (formerly Hyper Oracle); their stack runs Llama-2-scale models and Stable Diffusion in production today.
- [EZKL](https://github.com/zkonduit/ezkl) — the most mature zkML proving toolchain for neural-network-style models; our default for small, fixed models.
- [RISC Zero](https://www.risczero.com/) — a general-purpose zkVM; prove the execution trace of a program rather than a hand-built circuit. Flexible, with real overhead.

Pick by the model's size first, the output's value second, your patience for a challenge window third. The rest is parameter tuning.