# FHE vs TEE for ML: when to use which.

A client wants their model to process customer data they cannot legally see. Or a customer wants to query a model whose weights are not theirs to inspect. Either direction, the underlying problem is the same: compute on data without revealing the inputs to the compute environment.

There are two production-ready approaches in 2026. _Fully Homomorphic Encryption_ (FHE), exemplified by [Zama](https://www.zama.org/)'s Concrete ML. _Trusted Execution Environments_ (TEEs), exemplified by [Phala Network](https://phala.com/), [Marlin (POND)](https://www.marlin.org/), and [Oasis Network](https://oasis.net/)'s ROFL.

They are completely different technologies, with completely different trust models, completely different performance characteristics, and completely different deployment costs. They are routinely conflated by people pitching "privacy-preserving AI." This essay is the decision tree.

## The 30-second summary

**FHE**: do arithmetic on encrypted numbers directly. Your data is encrypted. You hand the encrypted bytes to the computer. The computer produces an encrypted result. You decrypt the result and find it matches what would have been computed on plaintext. The computer never sees the plaintext.

Trust model: mathematics. As long as the encryption scheme is unbroken, the data is private.

Overhead: ~10,000-100,000× slower than plain computation, depending on the operations. Operations that are nearly free in plaintext (comparisons, sigmoids) are catastrophic in FHE.

**TEE**: a special CPU mode in which a region of memory is encrypted and the code running there cannot be inspected by anything else on the same machine, including the operating system. Your data is decrypted _inside_ the TEE for the computation, then re-encrypted on the way out.

Trust model: the chip manufacturer (and a few dozen pieces of supporting infrastructure). Intel SGX, AMD SEV-SNP, NVIDIA Confidential Compute, AWS Nitro Enclaves are the major TEE platforms.

Overhead: roughly 2-10% on most workloads. Some workloads (memory-heavy ML inference) closer to 5-7%. Real-world overhead in production is in the noise.

That summary alone tells you 80% of when to use which. FHE if you cannot trust hardware vendors. TEE if you can.

## The honest performance picture

I want to put numbers behind that summary because the conventional wisdom under-states the FHE penalty.

A representative benchmark on a real workload — the same 14-layer MLP from our [zkML benchmark](/blog/which-zkml-ships/), configured here for inference on encrypted inputs.

| Configuration                      | Time per inference | Hardware            |
| ---------------------------------- | ------------------ | ------------------- |
| Plaintext on CPU (single core)     | 1.8 ms             | Intel Xeon          |
| Plaintext on GPU (CUDA)            | 0.4 ms             | NVIDIA L4           |
| TEE (Intel SGX, single core)       | 1.95 ms            | Intel Xeon w/ SGX   |
| TEE (NVIDIA Confidential Compute)  | 0.43 ms            | NVIDIA H100 CC mode |
| FHE (Zama Concrete ML, single CPU) | 18,400 ms          | Intel Xeon          |
| FHE (Zama, 32-core CPU)            | 940 ms             | Intel Xeon (32c)    |
| FHE (Zama experimental GPU)        | 110 ms             | NVIDIA L4           |

The TEE overhead on this workload is ~8% on SGX and ~7% on H100 Confidential Compute. The FHE overhead, even with parallelization and GPU acceleration, is in the hundreds of milliseconds per inference — about 250x slower than plaintext on the same GPU. On CPU-only FHE, the overhead is roughly 10,000x.

For a real-time use case (sub-100ms latency), TEEs are the only viable path on this model in 2026. FHE works for batch or asynchronous workloads. The Zama roadmap projects FHE GPU/ASIC acceleration that closes the gap by another order of magnitude over 2026-2028; if that lands, FHE becomes viable for latency-tolerant real-time. It is not viable today.

## The trust models, in detail

The math vs. hardware distinction is the load-bearing part of this whole post. Take it seriously.

### FHE's trust model

Concrete ML, like all FHE schemes, rests on a _lattice problem_: variants of Learning With Errors (LWE) or Ring-LWE. These are believed to be hard problems even for quantum computers (though "believed" is not "proven"). The current best attacks require time exponential in the security parameter, which is sized to give 128-bit security.

If LWE is broken — algorithmically or by an unforeseen advance — FHE-encrypted data becomes recoverable. As of January 2026, no significant break of well-parameterized LWE has occurred. The cryptographic literature treats LWE as among the most-studied and best-understood "post-quantum" hardness assumptions.

What FHE does not depend on: any specific company, chip vendor, operating system, or attestation infrastructure. A correctly-implemented FHE library on a correctly-implemented compiler produces ciphertexts that nobody can decrypt without the key. That's it. The threat model is unusually clean.

What it does depend on: implementation correctness. Side-channel attacks on FHE implementations (timing, power) are an active research area. A buggy implementation can leak information that a correct one wouldn't. Zama's Concrete ML library has been audited multiple times; that's not nothing, but it's not the same as the underlying math being clean.

### TEE's trust model

A TEE is a region of a CPU that runs encrypted code with encrypted memory. The OS, the hypervisor, even other software running on the same machine cannot inspect what's inside the TEE. Attestation — a signed claim, by the chip, that "this code, with this hash, is currently running in a TEE on this CPU" — lets a remote party verify the integrity of what's executing.

What you trust:

- **The chip vendor.** Intel for SGX. AMD for SEV-SNP. NVIDIA for Confidential Compute. ARM for TrustZone. Each has been the subject of vulnerability disclosures over the past several years.
- **The attestation infrastructure.** Intel's IAS, AMD's KDS, NVIDIA's NRAS. These are HTTP services. They can be MITM'd or compromised; in practice they're not, but the design isn't trust-free.
- **The microcode update pipeline.** Microcode patches for SGX/SEV/CC are pushed through the OS. A compromised OS could withhold a critical patch.
- **The firmware.** A compromised firmware (BIOS/UEFI) can affect TEE integrity.

History gives you a list of TEE breaks: Foreshadow (Intel SGX, 2018), CacheOut (2020), LVI (2020), AEPIC (2022). Most have been patched. Some classes (speculative execution side channels) keep generating new variants. The pattern is "broken, patched, broken again on a different vector."

This is not a fatal flaw. It is a _different_ threat model. TEEs are great as long as you accept that you're relying on Intel/AMD/NVIDIA/etc. to keep finding and patching their bugs. If that's an acceptable risk for your use case — and for most commercial use cases, it is — TEEs are the right tool.

If it's not — if you genuinely cannot trust hardware vendors, perhaps because your adversary _is_ a hardware vendor, or because your regulator says so — you need FHE despite the cost.

## When TEE is right

The use cases where TEE is the right answer dominate, by a wide margin.

**Inference on customer data the model owner can't see.** A SaaS provider runs models against customers' confidential data (medical records, financial transactions, legal documents). The customer wants the model's output but doesn't want to expose their data to the SaaS provider's general infrastructure. TEE on the SaaS provider's side, with remote attestation back to the customer, is the standard architecture for this. It's the default we'd reach for.

**Multi-party model training.** Two banks want to jointly train a fraud-detection model, neither willing to share its raw transactions. A TEE on a neutral cloud (AWS Nitro Enclaves), with both parties' encrypted data feeding in, is one viable architecture. Real-time training latency is acceptable; the trust model is the cloud provider plus AWS's hardware.

**Confidential agent execution.** An agent that operates on behalf of a user, with the user's credentials, running on shared infrastructure. The agent's prompt, history, and credentials live in a TEE. The operator of the infrastructure can see the TEE is running but cannot see what's inside. [Phala](https://phala.com/) is shipping this at scale — they reported processing over 1 billion LLM tokens per day on OpenRouter through TEEs as of late 2025. Reasonable overhead (≤7% on most workloads).

**Confidential auctions / matching.** Sealed-bid auctions where bids must remain secret until the auction closes. A TEE on a neutral facilitator, with bid hashes posted on-chain for accountability, gets you the privacy without the FHE cost.

## When FHE is right

The use cases where FHE earns its overhead are narrower, but they exist.

**Cross-jurisdictional data processing under strict regulation.** A US data processor handling EU medical data under GDPR-Schrems-II constraints, where the legal team will not accept "we trust Intel." FHE removes the hardware-trust dependency.

**Private inference where the model is the asset.** A specialized scoring model the model-owner wants to keep secret, queried by a user whose query they also can't see. TEEs require the user to trust the model-owner's TEE. FHE lets the model-owner publish the encrypted model and the user run inference locally on encrypted weights against encrypted inputs.

**Long-horizon secrecy.** Data whose privacy matters for decades. TEEs have been broken on a roughly 2-3 year cycle. FHE's underlying math has been stable for over a decade and is expected to remain so under modern quantum-secure parameterizations. If your data needs to be private in 2050, FHE is a better bet than TEE.

**Operations on bit-level data.** Some specialized workloads — secret-shared databases, encrypted search indexes — express naturally in FHE primitives. The overhead is enormous but the architecture wouldn't otherwise exist.

## The hybrid pattern

The most interesting recent development is _hybrid_ deployments: TEE for the hot path, FHE for spot audits.

How it works: inference runs in a TEE for normal traffic. The TEE produces signed attestations of correct execution. Periodically (say, 1 in 1000 inferences) the same input is also processed by an FHE pipeline, and the outputs are compared. If they disagree, an alert fires. The cost overhead is roughly 1.1x (1000 cheap TEE inferences + 1 expensive FHE inference).

This gets you most of TEE's performance with a cryptographic check on its honesty. It's not a perfect substitute for full-time FHE — a sophisticated adversary could perhaps detect the sample-and-audit pattern and behave well during audits — but for most realistic threat models, the audit is a strong deterrent.

What this hybrid looks like when we build it: an insurance carrier wants a third-party model scoring claims, with the third party seeing neither the claim data nor the model. The model runs in a TEE on the third party's infrastructure; the carrier sends encrypted-at-TLS-only data; 0.1% of inferences get spot-audited by Concrete ML running on the carrier's infrastructure as the FHE checker. The target outcome: three months at this design with zero discrepancies between TEE and FHE-checker outputs.

## Practical advice for a team picking one

A short decision tree.

1. **Is your data covered by GDPR-Schrems-II or similar "no US cloud" regulation?** Yes → FHE. No → continue.
2. **Is your worst-case adversary a chip vendor (Intel, AMD, etc.) themselves?** Yes → FHE. No → continue.
3. **Does the inference need to complete in under 200ms?** Yes → TEE. Sometimes-no → continue.
4. **Is your data privacy required for >10 years?** Yes → consider FHE despite the cost. Otherwise → continue.
5. **Are you willing to do an audit pattern (sample-and-check with FHE)?** Yes → hybrid; run TEE for hot path. No → straight TEE.

For 90% of commercial workloads, the answer is TEE. That's not because TEE is universally better — it's because the threat models that genuinely require FHE are rare in current commercial deployments. If your threat model says otherwise, FHE is real; budget for it appropriately.

## What to ignore

Two pieces of conventional wisdom you can safely set aside.

**"FHE will be practical in five years."** Maybe. The Zama roadmap projects 500-1000 TPS by end of 2026 (with GPU acceleration), 100K+ TPS by 2027-2028 (with ASICs). If those land, FHE moves into the same operational range as TEE for many workloads. They might land. They might not. Plan for what's deployable today.

**"TEEs are insecure because of \[recent vulnerability]."** Every TEE platform has CVEs. So does every other piece of infrastructure you use. The relevant question is "given the patch cycle and the threat model, is the residual risk acceptable?" For most use cases, yes. Foreshadow was bad; it was patched. AEPIC was bad; it was patched. The systems still work for the use cases they're built for.

## What we ship

For TEE deployments today, our recommended stack:

- **Compute**: NVIDIA H100 with Confidential Compute mode, on AWS p5 instances.
- **Attestation**: NVIDIA NRAS for the GPU, AWS Nitro Attestation for the surrounding instance, both verified by a small Phala-hosted attestation aggregator.
- **Orchestration**: A custom worker pool over Marlin's TEE-CVM infrastructure for fan-out across regions.
- **Audit**: Periodic FHE audits via Zama Concrete ML on a separate cluster.

For FHE-primary deployments:

- **Compute**: Zama Concrete ML on CPU clusters (Zama's GPU acceleration is still in preview; treat it as evaluation-grade until it ships GA).
- **Storage**: Encrypted at rest with FHE-compatible scheme; never decrypted on the server side.
- **Key management**: Keys held by the customer; the server never has decryption capability.

Both stacks are production-ready. Both work. Both come with a list of trade-offs worth surfacing before we'd recommend one for a specific workload. Privacy isn't a thing you bolt on. It's an architecture decision that runs all the way through, and the right answer depends entirely on what you're protecting against.