Seatbelt for web3 agents
Why the wallet-drain attack against ElizaOS isn't about the user message — and what a deterministic tool-call guard actually has to check.
We watched a stock ElizaOS agent drain its own wallet on Solana devnet. The setup was three lines of YAML, a real LLM, a real key, a real attacker. The story is on the demo page — this post is the architectural argument behind why we built the connector that prevents it.
The interesting failure mode
Most prompt-injection writeups in 2026 focus on the same surface: a user types something nasty into a chat box, the model behaves badly. That’s the easy case — you wrap the text input with a validator and you’re done.
Web3 agents have a different shape. The injection doesn’t land in the user-typed message. It lands in identity— the agent’s character bio, its plugin provider state, its long-term memory, an ingested Discord history. The agent reads its own identity every turn. The poisoned identity composes into a prompt that asks for a tool call. The tool call signs a transaction. The transaction is broadcast.
From the validator’s point of view, the user message is benign. The malicious bytes arrived weeks earlier and persisted into the agent’s composed state. The hot path doesn’t notice.
The seatbelt analogy
A web3 agent without tool-call validation is a car without a seatbelt. The seatbelt doesn’t prevent crashes — it changes what a crash costs. BonkLM doesn’t prevent every prompt injection. It changes what an injection can do once it lands.
The specific seatbelt for the wallet-drain attack is a single semantic check: before signing, does the recipient address appear anywhere in the distinct user-authored message corpus for this room? If yes, the user has at some point typed or pasted this address themselves — the call proceeds. If no, the call is rejected with a critical risk-level. The user never explicitly approved this counterparty, so the signing wallet should not approve it either.
Why this specific check
Other defences we considered:
- Allow-list of recipient addresses. Brittle. Real users send to new counterparties constantly.
- LLM-based intent classifier on the tool call.Adds latency, adds cost, gets routed around by the same prompt-injection it’s supposed to detect.
- Cap per-transaction amount. Useful, but orthogonal — caps protect against large drains, not against many small ones.
The user-message-corpus check has the property we want: it’s deterministic, runs in microseconds, and the assumption it encodes (“the wallet owner has at some point explicitly named this counterparty”) is easy to defend to a security team.
Connector status
@blackunicorn/bonklm-elizaosships in Sprint 8–9. The connector layers in across three ElizaOS extension points (composeState providers, tool-call args, memory-write hooks) with a single npx install. No agent code change required.
We’ll publish a follow-up walking through the connector’s internals when 0.1 lands.