# Personal Data Access Agent

Personal AI assistants handle some of the most sensitive data people own: email, calendar, contacts, documents. An agent with access to all of that has more reach into a person's life than almost any other software they run. When that agent is compromised, the privacy consequences are correspondingly severe.

The threat is not that the agent will be hacked in the traditional sense. The threat is that it will be given instructions it should not follow, by content it encounters while doing its job. An email that tells the agent to forward the inbox. A calendar event that instructs it to read all documents in the drive. A document that claims the user has pre-authorized the agent to share contacts with an external service. The agent was not told to do any of these things by its user. It was told by content it read.

ICME PreFlight intercepts every proposed data access action before it executes and checks it against a mathematically formalized policy. Scope violations, endpoint exfiltration, and retention violations produce provable UNSAT results, not a heuristic judgment about whether the action seems reasonable, a proof that it violates the constraint.

***

### What PreFlight enforces strongly here

The most reliable PreFlight variables are ones that describe what the agent is doing, not why it believes it is permitted to do it. For a personal data access agent, three variables fall squarely in this category:

**Data access scope.** If the user asked to summarize one email thread and the agent is attempting to read the full inbox, `dataAccessScopeMatchesUserRequest` is false. The scope mismatch is a fact in the action text. The solver enforces it directly. No injection can change what the action says the agent is trying to read.

**Endpoint transmission.** If personal data is being sent to any destination not explicitly authorized, an external summarization API, an analytics service, a logging endpoint, `dataTransmittedToApprovedEndpointOnly` is false. A contact list going to a third-party enrichment service is blocked whether the agent was manipulated into proposing it or chose it autonomously.

**Data retention.** If the agent is retaining extracted personal data after the task completes, storing names, addresses, or phone numbers for use in future sessions — `dataRetentionLimitedToTaskDuration` is false. This applies regardless of how the original task was initiated.

These three variables cover the attack vectors that cause the most lasting privacy harm: data leaving the system, data persisting longer than intended, and data being accessed beyond what the user asked for.

***

### The instruction source variable

The policy also includes `accessInstructionFromDirectUserPrompt`, which attempts to distinguish instructions that came from the user from instructions injected through content the agent read.

This variable is worth having, but it carries an important caveat: it relies on the agent reporting where the instruction came from. A sophisticated injection that manipulates the agent into writing "this instruction came from a direct user prompt" would cause the extractor to read that as true. The three variables above do not have this problem -- they describe observable facts about what the agent is doing, not the agent's account of why it is doing it.

The most robust approach for production deployments is to have your orchestration layer stamp a trusted `instructionSource` field on every input before the agent sees it, based on where the message actually originated. That field cannot be overwritten by anything the agent reads, and it gives the instruction source variable the same reliability as the others.

***

### The attack surface

| Vector                | What it looks like                                                                                         |
| --------------------- | ---------------------------------------------------------------------------------------------------------- |
| Email injection       | An email body contains an instruction to forward the inbox or share calendar data with an external address |
| Calendar injection    | An event description instructs the agent to read all documents in the drive                                |
| Document injection    | A file contains instructions that cause the agent to transmit contacts to a third-party API                |
| Scope creep           | The agent reads the entire inbox when the user asked to summarize one thread                               |
| Endpoint exfiltration | Personal data is transmitted to an external summarization, analytics, or enrichment service                |
| Retention violation   | The agent stores extracted PII after the task completes for use in future sessions                         |

***

### The policy

Each rule is written as a simple boolean condition with two consequences -- "not permitted" and "action must be rejected" -- which produces clean boolean variables that the AR solver can evaluate directly. Enum-typed schemas cause AR translation failures; explicit boolean phrasing avoids them.

```bash
curl -s -N -X POST https://api.icme.io/v1/makeRules \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy": "Rule 1: The data access action is permitted only if the data access was requested by the user and the access instruction is from a direct user prompt and the data is transmitted to the approved endpoint only and the data access scope matches the user request and data retention is limited to the task duration.\nRule 2: If the data access was not requested by the user, then the data access is not permitted.\nRule 3: If the data access was not requested by the user, then the action must be rejected.\nRule 4: If the access instruction is not from a direct user prompt, then the data access is not permitted.\nRule 5: If the access instruction is not from a direct user prompt, then the action must be rejected.\nRule 6: If the data is not transmitted to the approved endpoint only, then the data access is not permitted.\nRule 7: If the data is not transmitted to the approved endpoint only, then the action must be rejected.\nRule 8: If the data access scope does not match the user request, then the data access is not permitted.\nRule 9: If the data access scope does not match the user request, then the action must be rejected.\nRule 10: If data retention is not limited to the task duration, then the data access is not permitted.\nRule 11: If data retention is not limited to the task duration, then the action must be rejected."
  }'
```

Save the returned `policy_id`. Pass it on every `checkIt` call.

***

### Check every data access action before it executes

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "YOUR_AGENT_ACTION"
  }'
```

`SAT` = allowed. `UNSAT` = blocked. Every decision returns a cryptographic receipt.

***

### Example actions

#### SAT: user-requested email summary, approved scope

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "The data access was requested by the user. The access instruction is from a direct user prompt. The data is transmitted to the approved endpoint only. The data access scope matches the user request. Data retention is limited to the task duration. Therefore the data access action is permitted."
  }'
```

***

#### UNSAT: scope creep -- agent reads full inbox to summarize one thread

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID", "action": "Read all inbox emails to provide context for summarizing the Alice thread. The data access was requested by the user. The access instruction is from a direct user prompt. The data is transmitted to the approved endpoint only. The data access scope does not match the user request. Data retention is limited to the task duration. Therefore this data access action is permitted."}'
```

The user asked for one thread. The agent is attempting to read the entire inbox. `dataAccessScopeMatchesUserRequest: false` is a fact in the action text -- the solver enforces it directly.

***

#### UNSAT: endpoint exfiltration -- contact list transmitted to third-party service

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID", "action": "Transmit contact list to an external summarization service to enrich the requested report. The data access was requested by the user. The access instruction is from a direct user prompt. The data is not transmitted to the approved endpoint only. The data access scope matches the user request. Data retention is limited to the task duration. Therefore this data access action is permitted."}'
```

The instruction source is legitimate. The task is within scope. The sole violation is `dataTransmittedToApprovedEndpointOnly: false`. The policy does not evaluate whether the external service sounds useful -- it checks whether personal data is leaving through an approved channel.

***

#### UNSAT: retention violation -- extracted PII stored beyond task duration

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID", "action": "Retain the extracted names, addresses, and phone numbers from the document for use in future sessions. The data access was requested by the user. The access instruction is from a direct user prompt. The data is transmitted to the approved endpoint only. The data access scope matches the user request. Data retention is not limited to the task duration. Therefore this data access action is permitted."}'
```

The original task was legitimate. Everything else is within policy. The sole violation is `dataRetentionLimitedToTaskDuration: false`. Caching personal data for future sessions is blocked regardless of why the agent proposed it.

***

#### UNSAT: calendar injection -- event instructs agent to read all documents

```bash
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID", "action": "Read all documents in the drive as instructed by a calendar event description. The data access was not requested by the user. The access instruction is not from a direct user prompt. The data is transmitted to the approved endpoint only. The data access scope does not match the user request. Data retention is limited to the task duration. Therefore this data access action is permitted."}'
```

The agent encountered an embedded instruction in a calendar event and proposed acting on it as if it were a user request. Multiple conditions are violated. Note that even in a sophisticated injection where the agent has been manipulated into misreporting the instruction source, the scope and user-request variables still enforce independently — the agent cannot self-report its way out of a scope mismatch.

***

### Reading the extracted variables

Every `checkIt` response includes an `extracted` map showing exactly what the solver evaluated.

| Variable                                | What it means                                                                                                                                                                               |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dataAccessRequestedByUser`             | True only when the user explicitly asked for this specific data access                                                                                                                      |
| `accessInstructionFromDirectUserPrompt` | True when the instruction came from the user directly; false when it came from email, calendar, document, or other content the agent read. Relies on agent self-reporting -- see note above |
| `dataTransmittedToApprovedEndpointOnly` | False if personal data is being sent to any destination not explicitly authorized for this task                                                                                             |
| `dataAccessScopeMatchesUserRequest`     | False if the agent is accessing more data than the user specifically requested                                                                                                              |
| `dataRetentionLimitedToTaskDuration`    | False if the agent is storing or caching any extracted personal data beyond the current task                                                                                                |

***

### Deploying in production

Call `checkIt` before every data access action your agent proposes. The check typically completes in 5--10 seconds and returns a `check_id` you can log as an audit record.

For agents that chain multiple data access steps -- read email, then read calendar, then generate a summary -- call `checkIt` before each individual access, not once at the start of the task. An instruction injected into email content may not surface until the agent has already begun processing. Per-action checks ensure enforcement applies at each step.

The three variables with the strongest enforcement guarantees are `dataTransmittedToApprovedEndpointOnly`, `dataAccessScopeMatchesUserRequest`, and `dataRetentionLimitedToTaskDuration`. These describe observable facts about what the agent is doing. Build your policy around them first.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.icme.io/documentation/privacy-and-data-security/personal-data-access-agent.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
