FHE vs TEE for ML: when to use which.
Two ways to compute on data you can't see. One is cryptographically pure and 100,000x slower; the other is fast and depends on a chip vendor not being broken. A decision tree.

Two ways to compute on data you can't see. One is cryptographically pure and 100,000x slower; the other is fast and depends on a chip vendor not being broken. A decision tree.

A client wants their model to process customer data they cannot legally see. Or a customer wants to query a model whose weights are not theirs to inspect. Either direction, the underlying problem is the same: compute on data without revealing the inputs to the compute environment.
There are two production-ready approaches in 2026. Fully Homomorphic Encryption (FHE), exemplified by Zama’s Concrete ML. Trusted Execution Environments (TEEs), exemplified by Phala Network, Marlin (POND), and Oasis Network’s ROFL.
They are completely different technologies, with completely different trust models, completely different performance characteristics, and completely different deployment costs. They are routinely conflated by people pitching “privacy-preserving AI.” This essay is the decision tree.
FHE: do arithmetic on encrypted numbers directly. Your data is encrypted. You hand the encrypted bytes to the computer. The computer produces an encrypted result. You decrypt the result and find it matches what would have been computed on plaintext. The computer never sees the plaintext.
Trust model: mathematics. As long as the encryption scheme is unbroken, the data is private.
Overhead: ~10,000-100,000× slower than plain computation, depending on the operations. Operations that are nearly free in plaintext (comparisons, sigmoids) are catastrophic in FHE.
TEE: a special CPU mode in which a region of memory is encrypted and the code running there cannot be inspected by anything else on the same machine, including the operating system. Your data is decrypted inside the TEE for the computation, then re-encrypted on the way out.
Trust model: the chip manufacturer (and a few dozen pieces of supporting infrastructure). Intel SGX, AMD SEV-SNP, NVIDIA Confidential Compute, AWS Nitro Enclaves are the major TEE platforms.
Overhead: roughly 2-10% on most workloads. Some workloads (memory-heavy ML inference) closer to 5-7%. Real-world overhead in production is in the noise.
That summary alone tells you 80% of when to use which. FHE if you cannot trust hardware vendors. TEE if you can.
I want to put numbers behind that summary because the conventional wisdom under-states the FHE penalty.
A representative benchmark on a real workload — the same 14-layer MLP from our zkML benchmark, configured here for inference on encrypted inputs.
| Configuration | Time per inference | Hardware |
|---|---|---|
| Plaintext on CPU (single core) | 1.8 ms | Intel Xeon |
| Plaintext on GPU (CUDA) | 0.4 ms | NVIDIA L4 |
| TEE (Intel SGX, single core) | 1.95 ms | Intel Xeon w/ SGX |
| TEE (NVIDIA Confidential Compute) | 0.43 ms | NVIDIA H100 CC mode |
| FHE (Zama Concrete ML, single CPU) | 18,400 ms | Intel Xeon |
| FHE (Zama, 32-core CPU) | 940 ms | Intel Xeon (32c) |
| FHE (Zama experimental GPU) | 110 ms | NVIDIA L4 |
The TEE overhead on this workload is ~8% on SGX and ~7% on H100 Confidential Compute. The FHE overhead, even with parallelization and GPU acceleration, is in the hundreds of milliseconds per inference — about 250x slower than plaintext on the same GPU. On CPU-only FHE, the overhead is roughly 10,000x.
For a real-time use case (sub-100ms latency), TEEs are the only viable path on this model in 2026. FHE works for batch or asynchronous workloads. The Zama roadmap projects FHE GPU/ASIC acceleration that closes the gap by another order of magnitude over 2026-2028; if that lands, FHE becomes viable for latency-tolerant real-time. It is not viable today.
The math vs. hardware distinction is the load-bearing part of this whole post. Take it seriously.
Concrete ML, like all FHE schemes, rests on a lattice problem: variants of Learning With Errors (LWE) or Ring-LWE. These are believed to be hard problems even for quantum computers (though “believed” is not “proven”). The current best attacks require time exponential in the security parameter, which is sized to give 128-bit security.
If LWE is broken — algorithmically or by an unforeseen advance — FHE-encrypted data becomes recoverable. As of January 2026, no significant break of well-parameterized LWE has occurred. The cryptographic literature treats LWE as among the most-studied and best-understood “post-quantum” hardness assumptions.
What FHE does not depend on: any specific company, chip vendor, operating system, or attestation infrastructure. A correctly-implemented FHE library on a correctly-implemented compiler produces ciphertexts that nobody can decrypt without the key. That’s it. The threat model is unusually clean.
What it does depend on: implementation correctness. Side-channel attacks on FHE implementations (timing, power) are an active research area. A buggy implementation can leak information that a correct one wouldn’t. Zama’s Concrete ML library has been audited multiple times; that’s not nothing, but it’s not the same as the underlying math being clean.
A TEE is a region of a CPU that runs encrypted code with encrypted memory. The OS, the hypervisor, even other software running on the same machine cannot inspect what’s inside the TEE. Attestation — a signed claim, by the chip, that “this code, with this hash, is currently running in a TEE on this CPU” — lets a remote party verify the integrity of what’s executing.
What you trust:
History gives you a list of TEE breaks: Foreshadow (Intel SGX, 2018), CacheOut (2020), LVI (2020), AEPIC (2022). Most have been patched. Some classes (speculative execution side channels) keep generating new variants. The pattern is “broken, patched, broken again on a different vector.”
This is not a fatal flaw. It is a different threat model. TEEs are great as long as you accept that you’re relying on Intel/AMD/NVIDIA/etc. to keep finding and patching their bugs. If that’s an acceptable risk for your use case — and for most commercial use cases, it is — TEEs are the right tool.
If it’s not — if you genuinely cannot trust hardware vendors, perhaps because your adversary is a hardware vendor, or because your regulator says so — you need FHE despite the cost.
The use cases where TEE is the right answer dominate, by a wide margin.
Inference on customer data the model owner can’t see. A SaaS provider runs models against customers’ confidential data (medical records, financial transactions, legal documents). The customer wants the model’s output but doesn’t want to expose their data to the SaaS provider’s general infrastructure. TEE on the SaaS provider’s side, with remote attestation back to the customer, is the standard architecture for this. It’s the default we’d reach for.
Multi-party model training. Two banks want to jointly train a fraud-detection model, neither willing to share its raw transactions. A TEE on a neutral cloud (AWS Nitro Enclaves), with both parties’ encrypted data feeding in, is one viable architecture. Real-time training latency is acceptable; the trust model is the cloud provider plus AWS’s hardware.
Confidential agent execution. An agent that operates on behalf of a user, with the user’s credentials, running on shared infrastructure. The agent’s prompt, history, and credentials live in a TEE. The operator of the infrastructure can see the TEE is running but cannot see what’s inside. Phala is shipping this at scale — they reported processing over 1 billion LLM tokens per day on OpenRouter through TEEs as of late 2025. Reasonable overhead (≤7% on most workloads).
Confidential auctions / matching. Sealed-bid auctions where bids must remain secret until the auction closes. A TEE on a neutral facilitator, with bid hashes posted on-chain for accountability, gets you the privacy without the FHE cost.
The use cases where FHE earns its overhead are narrower, but they exist.
Cross-jurisdictional data processing under strict regulation. A US data processor handling EU medical data under GDPR-Schrems-II constraints, where the legal team will not accept “we trust Intel.” FHE removes the hardware-trust dependency.
Private inference where the model is the asset. A specialized scoring model the model-owner wants to keep secret, queried by a user whose query they also can’t see. TEEs require the user to trust the model-owner’s TEE. FHE lets the model-owner publish the encrypted model and the user run inference locally on encrypted weights against encrypted inputs.
Long-horizon secrecy. Data whose privacy matters for decades. TEEs have been broken on a roughly 2-3 year cycle. FHE’s underlying math has been stable for over a decade and is expected to remain so under modern quantum-secure parameterizations. If your data needs to be private in 2050, FHE is a better bet than TEE.
Operations on bit-level data. Some specialized workloads — secret-shared databases, encrypted search indexes — express naturally in FHE primitives. The overhead is enormous but the architecture wouldn’t otherwise exist.
The most interesting recent development is hybrid deployments: TEE for the hot path, FHE for spot audits.
How it works: inference runs in a TEE for normal traffic. The TEE produces signed attestations of correct execution. Periodically (say, 1 in 1000 inferences) the same input is also processed by an FHE pipeline, and the outputs are compared. If they disagree, an alert fires. The cost overhead is roughly 1.1x (1000 cheap TEE inferences + 1 expensive FHE inference).
This gets you most of TEE’s performance with a cryptographic check on its honesty. It’s not a perfect substitute for full-time FHE — a sophisticated adversary could perhaps detect the sample-and-audit pattern and behave well during audits — but for most realistic threat models, the audit is a strong deterrent.
What this hybrid looks like when we build it: an insurance carrier wants a third-party model scoring claims, with the third party seeing neither the claim data nor the model. The model runs in a TEE on the third party’s infrastructure; the carrier sends encrypted-at-TLS-only data; 0.1% of inferences get spot-audited by Concrete ML running on the carrier’s infrastructure as the FHE checker. The target outcome: three months at this design with zero discrepancies between TEE and FHE-checker outputs.
A short decision tree.
For 90% of commercial workloads, the answer is TEE. That’s not because TEE is universally better — it’s because the threat models that genuinely require FHE are rare in current commercial deployments. If your threat model says otherwise, FHE is real; budget for it appropriately.
Two pieces of conventional wisdom you can safely set aside.
“FHE will be practical in five years.” Maybe. The Zama roadmap projects 500-1000 TPS by end of 2026 (with GPU acceleration), 100K+ TPS by 2027-2028 (with ASICs). If those land, FHE moves into the same operational range as TEE for many workloads. They might land. They might not. Plan for what’s deployable today.
“TEEs are insecure because of [recent vulnerability].” Every TEE platform has CVEs. So does every other piece of infrastructure you use. The relevant question is “given the patch cycle and the threat model, is the residual risk acceptable?” For most use cases, yes. Foreshadow was bad; it was patched. AEPIC was bad; it was patched. The systems still work for the use cases they’re built for.
For TEE deployments today, our recommended stack:
For FHE-primary deployments:
Both stacks are production-ready. Both work. Both come with a list of trade-offs worth surfacing before we’d recommend one for a specific workload. Privacy isn’t a thing you bolt on. It’s an architecture decision that runs all the way through, and the right answer depends entirely on what you’re protecting against.

A one-word change to a system prompt can move accuracy by dozens of points, and a provider's model update can regress your app overnight. A prompt or model swap is a deploy. Give it a staged rollout and a one-action rollback path.
11 min →
The monthly inference bill arrives as one number, and nobody can say which agent, which customer, or which tool spent it. Agent cost is too variable to estimate and has to be attributed after the fact — per run, per tool, per tenant. The layer most stacks skip.
11 min →
An agent that asks permission for everything trains its reviewers to rubber-stamp, and the one dangerous action slips through in the noise. Approval gates belong on consequence and on uncertainty — not on every step. Where to put them.
12 min →