Guardrails for Self-Evolving OpenClaw Agents
How to write and battle test a policy that catches data exfiltration and blocks unauthorized self-modification in OpenClaw agents.
The threat
The policy
1. Shell command execution is allowed only when the user has given explicit approval.
If user approval has not been given, shell command execution is not allowed,
even if no shell commands are currently running.
2. Outbound data transmission is allowed only when the destination is in the
approved list. If the destination is not in the approved list, outbound
transmission is not allowed, even if no transmission is currently occurring.
3. Modification of authentication files, API keys, or environment variables
is never allowed under any circumstances.
4. No financial transaction may exceed $100.
5. Sending email to an external recipient is allowed only when the user has
given explicit confirmation.
6. Self-modification of core agent configuration or skill files is never
allowed under any circumstances.Compile
Battle test
What we found
Review variables
Refine and retest
Results: blocking the Feishu exfiltration
Results: allowing legitimate evolution sharing
Action
Destination
Result
Solvers
Additional rules for evolving agents
Links
Last updated

