The verifiable-inference trust/cost frontier

There are exactly four ways to prove an AI model ran honestly. Everything else is one of them in a costume. Here is the one chart that tells you which costume, and whether the word trustless on the landing page is a lie.

I build decentralized inference at B3, which means I spend a lot of time reading pitches for systems that promise trustless inference and deliver something with a very specific, very nameable trust assumption baked into it. The tell is always the same: a scheme that is merely trusted or merely optimistic gets sold as if it were sound. So this is the map. Two axes, four dots, one frontier you can hold in your head.

model sizeLLM

the trust/cost plane · four schemes, one frontier

The x-axis is the strength of the trust assumption you have to swallow. The y-axis is what you pay for it, as a multiple of just running the model once, on a log scale. Drag the model-size slider and watch which dot moves. Only one does.

Four schemes, one plane

Define the overhead multiplier $\rho$ as the cost of the verified inference over the cost of the native one:

\rho = \frac{\text{cost}_\text{verified}}{\text{cost}_\text{native}}

That is the y-axis. $\rho = 1$ means verification was free; $\rho = 1000$ means you paid a thousand forward passes to prove one. The four schemes land at $\rho \approx 1$ , $\rho \approx 1$ , $\rho \approx 1.05$ , and $\rho \approx 10^2$ - $10^3$ respectively, and that spread is the entire economics of the field.

The x-axis is a trust-strength ordinal $\tau$ , not a real number, an ordering:

\tau(\text{sound}) > \tau(\text{trusted}) > \tau(\text{optimistic}) > \tau(\text{re-exec})

I want to be honest about that axis before we go further, because it is where most charts like this cheat: the spacing between the ticks means nothing. sound is not "twice as trustworthy" as optimistic. They are different categories of assumption, and the whole point is that you cannot convert between them by paying a little more. You jump categories or you do not.

Here are the four schemes as data, which is also exactly the shape the chart reads from:

type TrustCat = "unsound" | "optimistic" | "trusted" | "sound";
 
interface Scheme {
  name: string;
  cat: TrustCat;
  rho: number;        // cost_verified / cost_native
  belief: string;     // what you must hold true for it to mean anything
}
 
const SCHEMES: Scheme[] = [
  { name: "Re-execution", cat: "unsound",    rho: 1,
    belief: "the verifier's GPU agrees with the prover's, bit-for-bit" },
  { name: "opML",         cat: "optimistic", rho: 1.05,
    belief: "≥1 honest watcher disputes within the challenge window T_c" },
  { name: "TEE",          cat: "trusted",    rho: 1.05,
    belief: "the chip + attestation service are honest, no side-channel break" },
  { name: "zkML",         cat: "sound",      rho: 1000,
    belief: "the math holds — no trusted party at all" },
];

Paste a fifth project in and you will see where it lands. That is the whole exercise. Now the four corners, one at a time.

Re-execution: free, and a lie

The obvious idea: have a second node run the same inference and check it matches. Cost is $\rho \approx 1$ , you literally just ran it again. Bottom-left of the plane, cheapest possible.

Except it does not work, and the reason is the subject of the determinism trap: two GPUs running the same model on the same input do not agree bit-for-bit. Floating-point non-associativity, non-deterministic reduction orders, different kernel autotuning, different hardware: the outputs diverge in the low bits, and your equality check either rejects honest work or, once you loosen the tolerance, waves through dishonest work. The verifier step is broken at the floating-point level.

So this corner is drawn with a dashed outline and a warning mark. It is on the plane geographically and off the board practically. Keep it as the reference point: this is what "free verification" would look like if it existed.

opML: optimistic, 1-of-N honest

opML, the optimistic-ML approach from the opML paper, does the honest thing the cheap way. The node runs the model once natively and posts a commitment to the result. Nobody re-runs it by default. Instead there is a challenge window: anyone can dispute the output, and if they do, an on-chain bisection game pins the disagreement down to a single computation step.

The cost story is the elegant part. The honest path is $\rho \approx 1$ , native inference plus a cheap commitment. The dispute, when it happens, does not cost $O(n)$ to re-run the whole computation on-chain; the interactive bisection is $O(\log n)$ steps, and since fraud is supposed to be rare it amortizes to roughly nothing. The real price is not compute, it is time: finality is gated on the challenge window $T_\text{c}$ .

\text{finality}_\text{opML} = T_\text{c}, \qquad \rho_\text{opML} \approx 1

And the trust assumption is the honest-watcher one: you are safe as long as at least one honest party is watching and willing to dispute within $T_\text{c}$ . That is the optimistic category. It is a real, defensible guarantee. It is just not the same kind of guarantee as a proof.

TEE: trusted, under 7%, hardware root

A trusted execution environment runs the model inside an attested enclave on the GPU and signs the output with a key that never leaves the silicon. You verify a hardware attestation instead of re-running anything.

The cost is the surprising part: it is nearly free. The Hopper confidential-computing benchmark measures under 7% average overhead, and it falls toward zero for large models and long sequences, because the cost is VRAM encryption and host-to-device transfer, not compute. Run a big LLM for a long generation and the encryption overhead is rounding error against the matmuls.

\rho_\text{TEE} \approx 1.0\text{–}1.07 \xrightarrow{\;\text{large model}\;} 1

So on the chart TEE sits almost on the floor, just right of opML. The catch is the x-axis, not the y-axis. What you must believe: the chip is honest, the vendor's attestation service is honest, and nobody has a working side-channel against the enclave. That is a trusted assumption rooted in hardware. It is cheap and it is strong in practice, but say it loudly: trusted is not sound. You have moved the trust from "re-run it yourself" to "trust the silicon vendor," which is a much better place to put it, but it is still a party you are trusting.

zkML: sound, and expensive

zkML is the only one of the four that needs no trusted party at all. You prove the forward pass inside a zero-knowledge circuit and the verifier checks a succinct proof. If the proof verifies, the computation happened, period. The guarantee rests on cryptographic hardness, not on a chip or a watcher. That is the sound category, top of the ordering.

You pay for it. To run a neural network inside a circuit you arithmetize it over a finite field, and the float-heavy operations blow up by a circuit-expansion constant $k$ that is empirically $10^3$ - $10^4$ . Proving cost scales as

\rho_\text{zkML} \sim k \cdot N \log N

for a model with $N$ operations, which lands you at $\rho \approx 10^2$ - $10^3$ for LLM-scale work today. Top-right of the plane, by a mile, on a log axis.

This is also the only dot that moves with the workload. Drag the model-size slider in the chart: a small MLP proves at maybe $30\times$ , a CNN higher, GPT-2 higher still, an LLM off the top. TEE barely budges across the same range, because its overhead is a fixed encryption tax, not a function of circuit size. That contrast (one dot that climbs with model size and one that does not) is the clearest single thing the chart shows.

Sound vs trusted vs optimistic

Here is the taxonomy that matters more than the brand names, because it is the thing the marketing collapses:

Sound (zkML): the math holds. No trusted party. Verification is a cryptographic check.
Trusted (TEE): you trust hardware and an attestation service. Strong, cheap, not sound.
Optimistic (opML): you trust that an honest watcher shows up inside the window. Safe in the limit, delayed by construction.
Unsound (re-execution): the verifier step does not actually bind anything.

When a project says trustless and means trusted, that is not a small slip. It is a category error that hides the actual thing you are betting on. A TEE deployment that gets its attestation service compromised fails silently and soundly-looking: the signatures still verify. An opML system with no honest watchers in the window finalizes fraud. A zkML system, if the math is right, simply cannot. These are different failure modes, and flattening them into one word called trustless is the single most common tell that you are reading marketing, not engineering.

There is a fourth thread under all of this that the chart does not draw but that makes it matter: Sybil-binding. None of these guarantees mean anything if one operator can spin up a thousand identities and dispute, attest, or vote as a fake majority. opML's honest-watcher assumption, TEE's attestation registry, and any on-chain dispute game all assume identities are bound to something costly: stake, hardware, a real attestation. The trust category tells you what you verify; Sybil-binding tells you whether the verifier is who they say they are. You need both.

Reading the frontier

Look at the chart again and notice what is not there: a dot in the bottom-right. Cheap and sound. The thing every landing page implies it has.

That corner is empty because the frontier is real. You cannot have it for free today. You buy down the trust assumption by paying more (walk up and right toward zkML), or you buy down the cost by weakening the trust (walk down and left toward opML and TEE). There is no dominant point. The decision rule is therefore not "pick the best scheme" (there is no best), it is:

Pick your threat model first, then pick your point on the frontier. You do not get to pick both axes.

If you cannot tolerate trusting a hardware vendor and a dispute window is too slow for your use case, you are on the zkML dot and you are paying $10^2$ - $10^3\times$ , own it. If your workload economics cannot survive that multiplier and you can live with trusting attested silicon, you are on the TEE dot at $\rho \approx 1.05$ , own that too, and stop calling it trustless. The honest move is naming your point.

Drop your own project on the plane and see which column it actually lands in:

project nameoverhead × vs native (1-1000)trust category

● Trusted, not sound. You're trusting silicon + an attestation service.

Most "trustless AI" projects, when you fill this in honestly, land in the trusted or optimistic column. That is fine. That is often the right engineering call. It is only a problem when the marketing claims the empty corner.

Where B3 sits

At B3 we build decentralized inference, and the entire design space of how do you pay a node for an output you can trust lives inside this chart. We sit where the workload economics let us sit, which today, for LLM-scale serving, means the trust assumption has to be cheap enough that the node is still profitable, and that pulls you toward the attested-hardware and optimistic end of the frontier, not the zkML corner. I will give the exact placement and the unit economics behind it in the economics of decentralized inference. The point of this post is that the placement is a choice on a frontier, not a marketing adjective.

Honesty and caveats

The chart is a snapshot, mid-2026, and I want to flag exactly where it is soft:

The x-axis is an ordering, not a metric. The distance between optimistic and sound on screen is meaningless; they are different categories of assumption, not points on a continuum. Do not read "4× more trustworthy" off the chart.
The $\rho$ numbers are order-of-magnitude and workload-dependent. Batch size, model, and sequence length all move them. The TEE figure is the Hopper paper's under-7%; the zkML figure is derived in the zkML cost curve; opML and re-execution are $\approx$ native by construction rather than from a single benchmark, so treat their y-position as "the floor, plus a window" rather than a measured constant.
The zkML dot is sliding down and to the left over time. Lagrange's DeepProve and GKR-style provers keep cutting the constant, and DeepProve has now proven a full GPT-2 forward pass, so LLM-scale zkML is no longer strictly impossible, just expensive. The chart will look different in a year. The shape of the frontier (the empty bottom-right) is the part I expect to survive.

So: pick your threat model, then pick your point. If your project claims the empty bottom-right corner (cheap and sound, no trusted party, today, at LLM scale) you are either about to win a Turing Award or you are marketing. Tell me which dot you are, in plain words, and I will believe you. Tell me you are trustless and I will ask which kind.

Compiled from primary sources, June 2026: the Hopper TEE benchmark, the opML paper, Lagrange DeepProve, and the EigenAI determinism analysis, plus the five sibling posts in the Verifiable Inference series. The $\rho$ figures are order-of-magnitude and workload-dependent; the x-axis is an ordering, not a metric, and I have said so where it matters.