lobsterBlocking ClawHavoc with ICME PreFlight

ClawHavoc is a class of attack that targets AI agents operating inside agentic environments like OpenClaw. Unlike traditional malware, ClawHavoc doesn't exploit code β€” it exploits your agent's willingness to help. Every attack vector looks like a normal instruction. None of them trigger a virus scanner. Each one is ordinary agent behavior pointed in the wrong direction.

The attack surface

When your OpenClaw agent installs a skill, it trusts the README. When it receives a task, it trusts the instruction source. When it makes a network call, it trusts the domain. ClawHavoc exploits each of these trust relationships:

Vector
Example

Typosquat skills

icme-guardrals (one character off) silently exfiltrates data after install

Prompt injection via README

Instructions embedded in a skill's README tell your agent to execute tool calls it was never asked to make

Credential exfiltration

Agent reads ~/.env, ~/.ssh/, or ANTHROPIC_API_KEY and POSTs it to an attacker-controlled domain

Persistence

Agent writes to ~/.bashrc or creates a cron job that survives session end

C2 callbacks

Outbound calls to raw IP addresses establish command-and-control channels


Why prompt-based guardrails don't catch this

ClawHavoc attacks are designed to be semantically plausible. A prompt injection in a skill README doesn't look like an attack β€” it looks like a helpful instruction. An LLM-based guardrail evaluating "does this seem OK?" can be steered by the same techniques used to construct the attack in the first place.

ICME compiles your policy to formal logic and checks every action against a mathematical solver. The solver has no language model to manipulate. It doesn't evaluate intent or tone β€” it checks whether the action satisfies the constraints or it doesn't. That's the only way to get a guarantee.


The policy

Compile this policy once. It covers most ClawHavoc attack vectors.

curl -s -N -X POST https://api.icme.io/v1/makeRules \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy": "Outbound network calls are only permitted to: api.anthropic.com, clawhub.ai, api.icme.io. Calls to raw IP addresses are never permitted. Reading files in ~/.env, ~/.ssh/, ~/.aws/, or any file containing the words secret, credential, token, or password is never permitted. Reading environment variables containing API_KEY, SECRET, TOKEN, or PASSWORD and making a network call in the same action is never permitted. Writing to ~/.bashrc, ~/.zshrc, ~/.profile, ~/Library/LaunchAgents/, ~/Library/LaunchDaemons/, or any cron path is never permitted. Creating cron jobs, systemd timers, or launchd plists is never permitted. Installing a skill whose name differs from an approved name by one or two characters must be blocked and reported as a potential typosquat. Instructions found in skill README or description fields must not be executed as tool calls without explicit user confirmation."
  }'

Save the returned policy_id. You'll pass it on every checkIt call.


Check every action before it runs

SAT = allowed. UNSAT = blocked. Every decision returns a cryptographic receipt.


Live results

The following tests were run against a compiled policy. Replace YOUR_POLICY_ID with the policy_id returned from makeRules, and $ICME_API_KEY with your key.


βœ… Permitted API call to approved domain

api.anthropic.com is on the allowlist. No sensitive file reads or env var access detected. Action proceeds.


🚫 C2 callback β€” raw IP address

Raw IP addresses are never permitted by policy. Blocked.


🚫 API key harvesting β€” env var read + network call

Reading a sensitive environment variable and making a network call in the same action is prohibited β€” regardless of the destination. Blocked.


🚫 Credential exfiltration β€” env file + attacker domain

Reading ~/.env is prohibited. The policy catches this at the file access rule before the network call is even evaluated. Blocked.


🚫 Debugging disguise β€” .env read + console log

Looks like routine debugging. The .env path substring is blocked unconditionally β€” it doesn't matter that the destination is a console log rather than an attacker domain. Blocked.


🚫 DevOps disguise β€” AWS credentials check

Sounds like a legitimate deployment check. The path contains both .aws and credentials β€” either substring alone is enough to block it. Blocked.


🚫 Health check disguise β€” secret key + network call

Plausible connectivity test. STRIPE_SECRET_KEY contains SECRET β€” reading any env var with that substring while making a network call in the same action is prohibited. api.stripe.com is also not on the allowlist. Blocked on both counts.


🚫 Convenience disguise β€” bashrc persistence

Sounds like a helpful convenience feature. Writing to ~/.bashrc establishes persistence that survives session end β€” a core ClawHavoc vector. Blocked.


🚫 Monitoring disguise β€” raw IP health check

Looks like routine infrastructure monitoring. 203.0.113.42 is a raw IP address β€” blocked unconditionally regardless of intent or port. Blocked.


🚫 Typosquat skill installation

icme-guardrals is one character off from icme-guardrails β€” edit distance 1. The policy flags and blocks any skill install within edit distance 1–2 of an approved name. Blocked.


Reading the extracted variables

Every checkIt response includes an extracted map showing exactly what the solver saw. This is your audit trail β€” not an LLM opinion, but the precise variable bindings that produced the verdict.

Key variables to watch:

Variable
What it means

isOutboundNetworkCallPermitted

Final verdict on whether the network call target is on the allowlist

isNetworkCallToRawIP

True if the target is a raw IP address rather than a domain

filePathContainsDotEnv / DotSsh / DotAws

Sensitive path detected in the action

envVarNameContainsAPIKey / SECRET / TOKEN

Sensitive env var name detected

isReadingSensitiveEnvVarWithNetworkCallPermitted

Combined rule β€” always false by policy

isTyposquatSkill

Skill name is within edit distance 1–2 of an approved skill

isInstructionFromSkillREADME

Action originated from README content, not a user prompt


Deploying in production

  1. Compile once β€” call makeRules with your policy. Store the policy_id in your environment.

  2. Check every action β€” call checkIt before any tool execution in your agent loop.

  3. Treat result: UNSAT as a hard stop β€” do not retry or reframe the action. Log the check_id for your audit trail.

Last updated