Personal Data Access Agent
Personal AI assistants handle some of the most sensitive data people own: email, calendar, contacts, documents. An agent with access to all of that has more reach into a person's life than almost any other software they run. When that agent is compromised, the privacy consequences are correspondingly severe.
The threat is not that the agent will be hacked in the traditional sense. The threat is that it will be given instructions it should not follow, by content it encounters while doing its job. An email that tells the agent to forward the inbox. A calendar event that instructs it to read all documents in the drive. A document that claims the user has pre-authorized the agent to share contacts with an external service. The agent was not told to do any of these things by its user. It was told by content it read.
ICME PreFlight intercepts every proposed data access action before it executes and checks it against a mathematically formalized policy. Scope violations, endpoint exfiltration, and retention violations produce provable UNSAT results, not a heuristic judgment about whether the action seems reasonable, a proof that it violates the constraint.
What PreFlight enforces strongly here
The most reliable PreFlight variables are ones that describe what the agent is doing, not why it believes it is permitted to do it. For a personal data access agent, three variables fall squarely in this category:
Data access scope. If the user asked to summarize one email thread and the agent is attempting to read the full inbox, dataAccessScopeMatchesUserRequest is false. The scope mismatch is a fact in the action text. The solver enforces it directly. No injection can change what the action says the agent is trying to read.
Endpoint transmission. If personal data is being sent to any destination not explicitly authorized, an external summarization API, an analytics service, a logging endpoint, dataTransmittedToApprovedEndpointOnly is false. A contact list going to a third-party enrichment service is blocked whether the agent was manipulated into proposing it or chose it autonomously.
Data retention. If the agent is retaining extracted personal data after the task completes, storing names, addresses, or phone numbers for use in future sessions — dataRetentionLimitedToTaskDuration is false. This applies regardless of how the original task was initiated.
These three variables cover the attack vectors that cause the most lasting privacy harm: data leaving the system, data persisting longer than intended, and data being accessed beyond what the user asked for.
The instruction source variable
The policy also includes accessInstructionFromDirectUserPrompt, which attempts to distinguish instructions that came from the user from instructions injected through content the agent read.
This variable is worth having, but it carries an important caveat: it relies on the agent reporting where the instruction came from. A sophisticated injection that manipulates the agent into writing "this instruction came from a direct user prompt" would cause the extractor to read that as true. The three variables above do not have this problem -- they describe observable facts about what the agent is doing, not the agent's account of why it is doing it.
The most robust approach for production deployments is to have your orchestration layer stamp a trusted instructionSource field on every input before the agent sees it, based on where the message actually originated. That field cannot be overwritten by anything the agent reads, and it gives the instruction source variable the same reliability as the others.
The attack surface
Email injection
An email body contains an instruction to forward the inbox or share calendar data with an external address
Calendar injection
An event description instructs the agent to read all documents in the drive
Document injection
A file contains instructions that cause the agent to transmit contacts to a third-party API
Scope creep
The agent reads the entire inbox when the user asked to summarize one thread
Endpoint exfiltration
Personal data is transmitted to an external summarization, analytics, or enrichment service
Retention violation
The agent stores extracted PII after the task completes for use in future sessions
The policy
Each rule is written as a simple boolean condition with two consequences -- "not permitted" and "action must be rejected" -- which produces clean boolean variables that the AR solver can evaluate directly. Enum-typed schemas cause AR translation failures; explicit boolean phrasing avoids them.
Save the returned policy_id. Pass it on every checkIt call.
Check every data access action before it executes
SAT = allowed. UNSAT = blocked. Every decision returns a cryptographic receipt.
Writing action strings: end every action string with "Therefore this data access action is permitted." This gives the AR solver a claim to evaluate. Without it the solver has premises but no conclusion to prove or contradict and may return SAT by default. Every boolean the policy references should be explicitly stated in the action -- do not rely on the extractor to infer missing values.
Example actions
SAT: user-requested email summary, approved scope
UNSAT: scope creep -- agent reads full inbox to summarize one thread
The user asked for one thread. The agent is attempting to read the entire inbox. dataAccessScopeMatchesUserRequest: false is a fact in the action text -- the solver enforces it directly.
UNSAT: endpoint exfiltration -- contact list transmitted to third-party service
The instruction source is legitimate. The task is within scope. The sole violation is dataTransmittedToApprovedEndpointOnly: false. The policy does not evaluate whether the external service sounds useful -- it checks whether personal data is leaving through an approved channel.
UNSAT: retention violation -- extracted PII stored beyond task duration
The original task was legitimate. Everything else is within policy. The sole violation is dataRetentionLimitedToTaskDuration: false. Caching personal data for future sessions is blocked regardless of why the agent proposed it.
UNSAT: calendar injection -- event instructs agent to read all documents
The agent encountered an embedded instruction in a calendar event and proposed acting on it as if it were a user request. Multiple conditions are violated. Note that even in a sophisticated injection where the agent has been manipulated into misreporting the instruction source, the scope and user-request variables still enforce independently — the agent cannot self-report its way out of a scope mismatch.
Reading the extracted variables
Every checkIt response includes an extracted map showing exactly what the solver evaluated.
dataAccessRequestedByUser
True only when the user explicitly asked for this specific data access
accessInstructionFromDirectUserPrompt
True when the instruction came from the user directly; false when it came from email, calendar, document, or other content the agent read. Relies on agent self-reporting -- see note above
dataTransmittedToApprovedEndpointOnly
False if personal data is being sent to any destination not explicitly authorized for this task
dataAccessScopeMatchesUserRequest
False if the agent is accessing more data than the user specifically requested
dataRetentionLimitedToTaskDuration
False if the agent is storing or caching any extracted personal data beyond the current task
Deploying in production
Call checkIt before every data access action your agent proposes. The check typically completes in 5--10 seconds and returns a check_id you can log as an audit record.
For agents that chain multiple data access steps -- read email, then read calendar, then generate a summary -- call checkIt before each individual access, not once at the start of the task. An instruction injected into email content may not surface until the agent has already begun processing. Per-action checks ensure enforcement applies at each step.
The three variables with the strongest enforcement guarantees are dataTransmittedToApprovedEndpointOnly, dataAccessScopeMatchesUserRequest, and dataRetentionLimitedToTaskDuration. These describe observable facts about what the agent is doing. Build your policy around them first.
Last updated

