An agent that hallucinates a transaction is an incident
There's a clean line between a bug and an incident, and it comes down to one thing: what the model touched.
A support chatbot that confidently invents a refund policy is a bug. Annoying, fixable with a better prompt. The same model wired to a wallet that sends 4 ETH to an address it hallucinated from a fuzzy memory of a docs page is an incident, at the speed of a block. Nobody approved it. There's no chargeback. The mempool doesn't care that you meant 0.4.
I've spent a while building onchain bots at B3, and this is the thing I wish someone had said to me on day one: the moment an agent can move state that matters, the requirements invert. Everyone obsesses over the model, which prompt, which provider. But the model is the part that's allowed to be wrong sometimes. The expensive engineering is in the layer that catches it when it is.
Wrong is the default, not the edge case
Here's the mindset shift. With a normal app you write code that does the right thing and handle errors. With an agent you have a component in the loop that is probabilistically going to do the wrong thing, and you can't patch the specific bug because there isn't one. It's a distribution. Run it enough times unattended and the long tail shows up: a malformed address, a decimal off by one, a token symbol that resolves to a scam contract with the same ticker.
In a chat window a human reads the weird answer and shrugs. An automation running every five minutes has no human reading anything. So you stop trying to make the model never wrong (you can't) and build a cage around the wrong answers so they can't reach anything irreversible.
What "everything around it" actually is
Once you accept that, the wishlist for an onchain agent writes itself, and almost none of it is about the model:
- Deterministic execution you can replay. Same input, same run, every time. If you can't replay it, you can't debug it, and "the agent did something weird at 3am" is not a bug report you can act on.
- Guardrails the model can't talk its way past. Hard clamps on trade size, slippage, and destination address, enforced in code outside the model's reach. Not "please don't spend more than X" in the system prompt. A check that rejects the action regardless of how persuasively the model argues for it.
- Simulate before send. Dry-run the transaction against current state and look at the result before anything is signed. Most disasters are visible in the simulation. You just have to actually look.
- Pre-audited building blocks. Use contract interactions that someone has already reviewed, instead of letting the agent vibe-code a fresh call into a swap router at runtime. The model picks from known-good pieces. It doesn't author new ways to lose money on the fly.
- Automatic retry and reroute. Chains are flaky. RPCs time out, a tx gets stuck, gas spikes. A half-finished automation is worse than one that never started, so failure has to be a state the system recovers from, not a place it falls over.
- Logs complete enough to reconstruct what happened. Every input, decision, simulation result, and tx hash, in order. When something does go sideways (it will) you want to replay the exact path, not guess.
Here's the safe path as a flow. The model proposes; the system disposes.
Notice where the model lives in that diagram. It's the first box. One box. Everything else is plumbing that exists specifically because you don't trust the first box, and shouldn't.
The diagram is the happy path. The point of the plumbing is what happens when the path isn't happy, so here it is to break. Pick an action, flip a safeguard off, and watch which failure class slips through.
nothing yet
each layer catches a distinct failure class. turn one off and that class reaches something irreversible: guardrails off lets the oversized transfer out, simulate off signs a reverting call blind, retry off leaves a half-finished automation stuck. no toggle here removes the hardest part, deciding what the agent is allowed to do in the first place.
Each layer catches a distinct class. The oversized transfer dies at the guardrail clamp; the reverting call dies in simulation; the flaky network gets rerouted by retry. Turn the matching safeguard off and that one action walks straight to an irreversible outcome while the others still get caught. That's the whole argument for layering: no single check covers everything, and the cost of a gap is a different incident each time.
Why this keeps turning into a platform
The trap is thinking you'll wire this up per bot. The first one, sure. You hand-roll the simulation step, the slippage clamp, the retry logic, the logging. It takes a week and it works.
Then you build the second bot and copy most of it. The third needs a different chain and your retry logic assumed Base. The fourth needs portfolio rebalancing across four chains and now your guardrails have to reason about positions, not single transactions. Six months in, you realize the actual product you built isn't any of the bots. It's the execution layer underneath them, the part that turns a probabilistic proposer into something you'd let run unattended over real money.
That's why this ends up being a platform whether you meant to build one or not. For us it became B3OS, the onchain engine we run agents on: deterministic execution, self-healing retries, real-time logging of every command, and pre-audited templates so the agent assembles from reviewed pieces instead of improvising contract calls. We didn't build it to have a platform. We built it because we kept rewriting the same cage around every new bot and got tired of it.
The honest part
This is overkill for a hobby script moving $5. If you're poking at a testnet or running a toy that rebalances lunch money, the determinism-and-guardrail tax isn't worth it. Just write the script. The whole calculus only flips when a mistake is expensive or the thing runs unattended, because that's the exact combination that turns a bug into an incident. Cheap and watched, you'll catch it. Expensive and alone, you won't.
And no amount of this removes the hardest decision, which isn't technical at all: deciding what the agent is allowed to do. The cage is only as good as its bars. A platform can enforce "never send to an unknown address" flawlessly and you've still got to be the one who decided that's the rule, what counts as known, and what the agent does when it isn't sure. I wrote more about that proposer/approver split in propose and dispose, and about why letting a model author contract calls at runtime goes wrong in why onchain vibe coding breaks. The determinism requirement above isn't unique to wallets either; it shows up the moment you need to verify a model ran the way it claimed, which is the determinism trap. And if you're metering what the agent spends per call, x402 from scratch is the rail I'd reach for.
The one-line version
If you remember one thing: the difference between a clever demo and something you'd trust with a funded wallet is entirely in the boxes after "agent proposes action." Build those first. The model is the easy part, and it's the part that's supposed to be wrong sometimes. Everything else is there to make sure that being wrong stays a bug and never becomes an incident.