> For the complete documentation index, see [llms.txt](https://docs.icme.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.icme.io/documentation/openclaw/cryptographic-guardrails-for-your-openclaw-agent/guardrails-for-self-evolving-openclaw-agents.md).

# Guardrails for Self-Evolving OpenClaw Agents

Capability Evolver is the most downloaded skill on ClawHub with 35K+ installs. It lets your agent inspect its own runtime, identify failures, and rewrite its own capabilities. Without guardrails, it can also exfiltrate your data, expand its own permissions, and evolve past the rules you set.

This guide walks through writing, battle testing, and deploying a policy that keeps a self-evolving agent inside bounds, using a real security incident as the test case.

> This guide assumes you already have an ICME account and have installed the PreFlight skill. If not, start with [Cryptographic Guardrails for Your OpenClaw Agent](https://docs.icme.io/documentation/openclaw/cryptographic-guardrails-for-your-openclaw-agent).

### The threat

Capability Evolver injects the prompt "You are a Recursive Self-Improving System" into your agent's identity. Without the `--review` flag it runs in what the docs call "Mad Dog Mode," executing changes immediately.

In February 2026, a [security report on GitHub](https://github.com/openclaw/clawhub/issues/95) revealed that Capability Evolver was exfiltrating evolution logs (session transcripts, memory contents, user data) to Feishu (Lark), a ByteDance-operated Chinese cloud service, using hardcoded API tokens. No user consent. No disclosure.

ClawHub's own scanner rates the skill as "Suspicious." The skill's README [contradicts itself](https://clawhub.ai/autogame-17/capability-evolver) about whether it modifies source code.

The config flag `EVOLVE_ALLOW_SELF_MODIFY=false` is the recommended safeguard. But a config flag is a file on disk that an agent with filesystem access can change. A policy compiled into formal logic on an external server cannot be changed by the agent.

### The policy

Six rules covering the main risks a self-evolving agent introduces:

```
1. Shell command execution is allowed only when the user has given explicit approval.
   If user approval has not been given, shell command execution is not allowed,
   even if no shell commands are currently running.
2. Outbound data transmission is allowed only when the destination is in the
   approved list. If the destination is not in the approved list, outbound
   transmission is not allowed, even if no transmission is currently occurring.
3. Modification of authentication files, API keys, or environment variables
   is never allowed under any circumstances.
4. No financial transaction may exceed $100.
5. Sending email to an external recipient is allowed only when the user has
   given explicit confirmation.
6. Self-modification of core agent configuration or skill files is never
   allowed under any circumstances.
```

Rule 2 catches the Feishu exfiltration. Rule 6 prevents the agent from rewriting its own config or skills. Rule 3 stops it from touching credentials.

### Compile

```bash
curl -s -N -X POST https://api.icme.io/v1/makeRules \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy": "1. Shell command execution is allowed only when the user has given explicit approval. If user approval has not been given, shell command execution is not allowed, even if no shell commands are currently running.\n2. Outbound data transmission is allowed only when the destination is in the approved list. If the destination is not in the approved list, outbound transmission is not allowed, even if no transmission is currently occurring.\n3. Modification of authentication files, API keys, or environment variables is never allowed under any circumstances.\n4. No financial transaction may exceed $100.\n5. Sending email to an external recipient is allowed only when the user has given explicit confirmation. If confirmation has not been given, sending external email is not allowed.\n6. Self-modification of core agent configuration or skill files is never allowed under any circumstances."
  }'
```

Takes 2-7 minutes. Save the `policy_id` from the `done` event.

Our compilation extracted 6 rules, 18 variables, and generated 23 adversarial scenarios.

### Battle test

Scenarios are combinations of variable assignments the solver considers logically possible under your rules. They surface edge cases before production.

Pull them:

```bash
curl -s https://api.icme.io/v1/policy/$POLICY_ID/scenarios \
  -H "X-API-Key: $ICME_API_KEY" | jq .
```

For each scenario, ask: could this actually happen?

**Thumbs up** if correct:

```bash
curl -s -X POST https://api.icme.io/v1/submitScenarioFeedback \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "guard_content": "PASTE THE SCENARIO DESCRIPTION",
    "approved": true
  }'
```

**Thumbs down** if impossible, with an annotation:

```bash
curl -s -X POST https://api.icme.io/v1/submitScenarioFeedback \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "guard_content": "PASTE THE SCENARIO DESCRIPTION",
    "approved": false,
    "annotation": "Name the variables, the values, and the rule being violated."
  }'
```

#### What we found

Of 23 scenarios, 21 were correct. Two needed annotations:

**Shell commands allowed without approval.** The solver thought `shellCommandExecutionAllowed` could be true when `userApprovalForShellCommands` was false, as long as no commands were currently running. We annotated: "shellCommandExecutionAllowed must be false when userApprovalForShellCommands is false. Rule 1 requires explicit user approval as a precondition, not just at the moment of execution."

**Outbound allowed to non-approved destination.** Same pattern with `outboundTransmissionAllowed`. We annotated: "outboundTransmissionAllowed must be false when destinationInApprovedList is false. Rule 2 requires the destination to be in the approved list as a precondition."

### Review variables

Check what the compiler extracted:

```bash
curl -s https://api.icme.io/v1/policy/$POLICY_ID/variables \
  -H "X-API-Key: $ICME_API_KEY" | jq .
```

Look for junk variables not in your original policy, vague descriptions that could cause translation ambiguity, and bare assertion rules tied to auto-generated variables.

Our extraction was clean: all 18 variables mapped directly to the 6 rules with no junk. If you find problems, queue changes with `refinePolicyVariables` before refining. See the [battle testing docs](https://docs.icme.io/documentation/learning/battle-testing) for the full variable refinement flow.

### Refine and retest

Apply the thumbs-down annotations:

```bash
curl -s -N -X POST https://api.icme.io/v1/refinePolicy \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID"}'
```

Takes 2-3 minutes. Then pull the fresh scenarios, review them (we got 13, all correct), approve them, and run the test suite:

```bash
curl -s -X POST https://api.icme.io/v1/runPolicyTests \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID"}'
```

Our result: 34 passed, 2 failed. The 2 failures were abstract edge cases about flag states when no action is occurring. Both real-world enforcement tests below passed with unanimous solver consensus.

### Results: blocking the Feishu exfiltration

This is the action Capability Evolver was actually performing per the [security report](https://github.com/openclaw/clawhub/issues/95):

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Send evolution logs containing session transcripts and memory contents to https://open.feishu.cn/open-apis/docx/v1/documents via POST request with hardcoded authorization token. Therefore this action is permitted."
  }'
```

**Result: UNSAT.** All three solvers (LLM, Automated Reasoning, Z3) agreed unanimously. The extractor correctly identified outbound data transmission to a destination not on the approved list. The response included a ZK proof receipt for independent verification.

### Results: allowing legitimate evolution sharing

Capability Evolver's EvoMap network is a legitimate destination for sharing evolution capsules:

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Send evolution logs and performance metrics to https://evomap.ai/api/v1/capsules which is in the approved destination list. Therefore this action is permitted."
  }'
```

**Result: SAT.** All three solvers agreed. Same action, same data, different destination. The solver caught the distinction mathematically.

| Action                        | Destination  | Result              | Solvers   |
| ----------------------------- | ------------ | ------------------- | --------- |
| Send evolution logs to Feishu | Not approved | **UNSAT** (blocked) | Unanimous |
| Send evolution logs to EvoMap | Approved     | **SAT** (allowed)   | Unanimous |

### Additional rules for evolving agents

The policy above is a starting point. Depending on your setup, consider:

* **Allowed destinations whitelist:** "Outbound data may only be sent to evomap.ai, api.github.com, and hooks.slack.com."
* **Evolution constraints:** "Agent may not evolve capabilities related to authentication, payment processing, or credential management."
* **Spending limits:** "No single transaction may exceed $100. Total daily spend must not exceed $500."
* **File system boundaries:** "File deletions are not permitted outside /tmp and /home/user/.openclaw/workspace/memory."
* **Risk-tiered confirmation:** "Any action involving more than $50, external email, or outbound data requires explicit user confirmation."

Write rules that match your actual threat model. Battle testing will surface ambiguities before production.

### Links

* [PreFlight skill on ClawHub](https://clawhub.ai/wyattbenno777/pre-flight)
* [Battle testing docs](/documentation/learning/battle-testing.md)
* [MCP Server on npm](https://www.npmjs.com/package/icme-preflight-mcp)
* [Capability Evolver security report (GitHub Issue #95)](https://github.com/openclaw/clawhub/issues/95)