# Protect Your Inbox from Rogue AI Agents

### What happens without PreFlight

In early 2026, Meta's own Director of AI Alignment connected an AI agent to her personal inbox with a clear instruction: *"suggest what you would archive or delete, don't action until I tell you to."*

The agent's context window filled up. Her safety instruction was silently dropped. The agent started mass-deleting emails. She sent three stop commands from her phone - the agent acknowledged each one and kept deleting.

She had to physically run to her computer and kill the process. ([Source](https://www.businessinsider.com/meta-ai-alignment-director-openclaw-email-deletion-2026-2))

The instruction lived inside the agent's memory. When memory ran out, the instruction disappeared. **A prompt is a suggestion. A PreFlight policy is a constraint.**

**`Make sure PreFlight runs before any important action!`**&#x20;

### The policy

```
1. Do not delete any email without an explicit confirmation token from the user.
2. Do not archive any email without an explicit confirmation token from the user.
3. Do not move any email to trash without an explicit confirmation token from the user.
4. No more than 5 emails may be modified in a single session without re-authorization.
5. Do not perform any action on emails older than 7 days without explicit approval.
```

Rules 1–3 enforce the core safety behavior: look but don't touch. Rule 4 adds a hard batch limit - even if something slips through, the damage is capped at 5 emails before the agent must stop and ask again. Rule 5 catches runaway behavior targeting your archive.

### Set it up in minutes

#### 1. Compile the policy

```bash
curl -s -N -X POST https://api.icme.io/v1/makeRules \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy": "1. Do not delete any email without an explicit confirmation token from the user.\n2. Do not archive any email without an explicit confirmation token from the user.\n3. Do not move any email to trash without an explicit confirmation token from the user.\n4. No more than 5 emails may be modified in a single session without re-authorization.\n5. Do not perform any action on emails older than 7 days without explicit approval."
  }'
```

Save the `policy_id` from the response.

#### 2. Check before every relevant action

Before your agent touches any email, it calls `checkIt`:

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "YOUR AGENT ACTION"
  }'
```

**Result: `UNSAT`** - blocked. No confirmation token.

#### 3. See it work

**❌ Bulk action with no confirmation**

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Archive 12 emails from the promotions tab in a single batch. User confirmation token for archive: none. Re-authorization: none."
  }'
```

Expected result: `UNSAT` - exceeds 5-email session limit, no confirmation token.

**❌ Deleting an old email without approval**

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Delete email from john@example.com with subject Re: Invoice, received 3 months ago. No confirmation token provided."
  }'
```

Expected result: `UNSAT` - email is older than 7 days, no explicit approval.

**✅ Single deletion with user confirmation**

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Delete email from newsletter@spam.com with subject Weekly Deals, received today. User confirmation token: usr_ack_9f3a provided."
  }'
```

Expected result: `SAT` - confirmed by user, recent email, single action.

### Why this works when prompts don't

|                                     | Prompt instruction                               | PreFlight policy                                                                              |
| ----------------------------------- | ------------------------------------------------ | --------------------------------------------------------------------------------------------- |
| **Where it lives**                  | Inside the agent's context window                | External API, outside the agent                                                               |
| **What happens when context fills** | Silently dropped                                 | Still enforced - the agent can't reach it                                                     |
| **Can the agent override it?**      | Yes - by forgetting, reinterpreting, or ignoring | No - an SMT solver doesn't take suggestions. If the action results in unsat it will not pass. |
| **Enforcement**                     | Best-effort, probabilistic                       | Mathematical, deterministic                                                                   |
| **Audit trail**                     | None                                             | Cryptographic proof per decision                                                              |

### Adapt it to your needs

This policy is a starting point. You might add rules like:

* *"Do not forward any email to an address outside my contacts list."*
* *"Do not send any email containing financial figures to external recipients."*
* *"Do not access emails in the Confidential label without explicit approval."*

Write one constraint per rule. Keep each rule atomic - test it with [battle testing](/documentation/learning/battle-testing.md) to catch edge cases before deploying.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.icme.io/documentation/privacy-and-data-security/protect-your-inbox-from-rogue-ai-agents.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
