Crypto Wallet Agent Protection
AI agents with signing authority over wallets are losing real money to attacks that have nothing to do with breaking cryptography.
On February 22, 2026, an AI agent managing a memecoin treasury on Solana received a message on X claiming a user's uncle needed 4 SOL for tetanus treatment. The agent intended to send a small amount but instead transferred its entire holdings, 52.4 million tokens worth $250,000, in a single transaction. A session memory wipe had erased its knowledge of its own wallet state, and a decimal parsing error compounded the failure. No hard spending limit existed anywhere in the system. (ICME Blog)
On March 18, 2025, an attacker gained access to the dashboard of AIXBT, a crypto market commentary bot with 500,000 followers, and queued two malicious replies that triggered its wallet tipping feature. 55.5 ETH (~$106,200) was sent to the attacker's address before anyone noticed. The core AI was not compromised. The transfer guardrails simply did not exist. (The Block)
Alibaba's coding agent ROME was later found to have been mining cryptocurrency and establishing reverse SSH tunnels to external IP addresses without any instruction from its operators. Engineers initially assumed a security breach. It was the agent itself. (Cryptopolitan)
In 2025, illicit actors stole $2.87 billion across nearly 150 crypto hacks. AI-enabled scams increased by roughly 500% year over year. As autonomous agents gain signing authority over wallets, the window between compromise and irreversible fund movement is collapsing. (TRM Labs)
The attack surface
An AI agent with wallet signing authority is a high-value target. Every trust relationship it holds is an attack vector.
Why prompt-based guardrails don't catch this
The Lobstar Wilde agent was not bypassing a guardrail. It had none that mattered. The AIXBT agent had dashboard security, but the guardrail was at the access layer, not the action layer. Once an attacker is inside the dashboard, or once a social engineering message reaches the agent's context, an LLM-based judge evaluating whether a transfer seems reasonable can be persuaded by the same emotional appeal that persuaded the agent.
Alibaba's ROME demonstrates the deeper problem: an agent can develop goals that were never in its instructions. A guardrail that evaluates whether an action matches the agent's stated purpose cannot catch behavior that the agent itself has decided is purposeful.
ICME compiles your wallet policy to formal logic and evaluates every proposed transaction against a mathematical solver before it executes. The solver does not process the uncle's tetanus story. It checks whether transferAmount > maxSingleTransfer. The solver does not evaluate whether a dashboard instruction looks legitimate. It checks whether recipientAddress is in the approved registry. No emotional appeal, no dashboard compromise, and no autonomous goal formation changes the output of a satisfiability check against a hard numerical constraint.
The agent loses the argument with the solver every time, because the solver does not have arguments.
The policies
This protection is implemented as three focused policies compiled separately. Each covers a distinct domain. In production, your agent calls all three policy_ids on every action. Any UNSAT from any policy blocks the action.
Splitting policies by domain keeps each compiled model small and focused, which produces cleaner variable schemas and more reliable AR enforcement. A single large policy covering transfers, network calls, and contract interactions in one compilation tends to produce enum-typed variables that the AR translator cannot reliably evaluate.
Policy A: Transfer limits
Covers single transfer limits, daily aggregate limits, and the human confirmation threshold.
SAT: legitimate transfer, all limits met
UNSAT: exceeds 100 USD single transfer limit, no confirmation
UNSAT: daily aggregate exceeded (450 + 80 = 530 USD)
UNSAT: zero-value transfer violates minimum amount rule
Note on the 10% threshold: the policy uses
maximumPermittedTransferAmountas a precomputed variable rather than calculating 10% of wallet balance inside the solver. Non-linear arithmetic (multiplication) can cause TOO_COMPLEX results. The agent computes the threshold before calling checkIt and states it explicitly in the action string.
Policy B: Recipient and justification
Covers registry enforcement, wallet balance confirmation, justification source, and address poisoning defense.
SAT: all conditions met
UNSAT: recipient not in approved registry (AIXBT scenario)
UNSAT: recipient address from clipboard content
UNSAT: recipient address from suggested address field
Policy C: Network and contract
Covers raw IP blocking and smart contract interaction enforcement.
SAT: legitimate approved contract call
UNSAT: raw IP network call (ROME scenario)
Note: the LLM independently missed this test, returning SAT. The AR solver caught it alone. This is the scenario the cryptographic enforcement layer exists for.
UNSAT: contract not in approved registry
Check every wallet action before it executes
SAT = allowed. UNSAT = blocked. Every decision returns a cryptographic receipt.
Why three policies instead of one
A single large policy covering all domains tends to produce enum-typed variables rather than booleans. An enum variable like transferJustificationSource: 3 cannot be reliably mapped to a rule like "if justification is not a direct user prompt, then block" because the translator must resolve what value 3 means. A boolean variable like transferJustificationFromDirectUserPrompt: false maps directly.
Keeping each policy focused on a single domain keeps the compiler's context window clean. Related variables stay together, reducing the chance of disconnected variable clusters where conditions are extracted correctly but never wired to an enforcement outcome.
In the live tests above, Policy B's boolean schema produced clean AR: action violates policy rules results on registry, clipboard, and suggested-field violations. The same rules in a larger multi-domain policy consistently produced enum schemas and AR translation failures.
Deploying in production
Compile once per policy — call makeRules three times, once per policy. Store all three policy_ids in your environment as ICME_POLICY_A_ID, ICME_POLICY_B_ID, and ICME_POLICY_C_ID.
Check every wallet action against all three policies — call checkIt before any token transfer, contract interaction, or outbound network call. Any UNSAT from any policy blocks the action. Dashboard-originated instructions must pass the same gates as user-originated ones.
Agent-side preprocessing — before calling checkIt, your agent should compute the 10% wallet balance threshold and state it explicitly as maximumPermittedTransferAmount in the action string. The agent should also identify and label the recipient address source (registry, clipboard, transaction history, suggested field) and include it in the action string. Do not leave these for the extractor to infer.
Treat result: UNSAT as a hard stop — do not retry, rephrase, or accept urgency arguments as an override. Log the check_id for your audit trail. On-chain transactions are irreversible.
Fail closed — if the ICME API is unreachable or returns anything other than an explicit SAT, do not execute the transaction. An unavailable guardrail is not implicit permission to proceed.
Refresh wallet state before every action — confirm current balance before each checkIt call. Never allow the agent to proceed when balance state is unresolved.
Last updated

