> For the complete documentation index, see [llms.txt](https://docs.icme.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.icme.io/documentation/learning/how-icme-preflight-works.md).

# How ICME PreFlight Works

ICME PreFlight is an **AI agent guardrail** system built on **formal verification**. It uses **automated reasoning** to decide whether an agent action is allowed. It uses **zero-knowledge proofs** to prove that decision happened correctly.

If you are evaluating AI agent security, this is the core model:

1. Translate a plain-English policy into formal logic.
2. Check that the policy is internally consistent.
3. Evaluate each agent action with an SMT solver.
4. Return `SAT` or `UNSAT`.
5. Generate a cryptographic proof of the result.

For a deeper comparison with LLM judges, see [Formal Verification vs Prompt-Based Guardrails](/documentation/learning/how-icme-preflight-works/formal-verification-vs-prompt-based-guardrails.md). For the proof layer, see [Zero-Knowledge Proofs](/documentation/learning/how-icme-preflight-works/zero-knowledge-proofs.md).

***

### Why AI agent guardrails fail when they rely on model judgment

Most AI agent guardrails work by asking a model to evaluate whether an action is allowed. The model reads the policy, reads the action, and returns a judgment.

This has a fundamental weakness. The model that enforces the policy is the same kind of system as the agent trying to bypass it. Any input that can influence the agent can potentially influence the guardrail. Researchers have demonstrated repeatedly that sufficiently creative prompts can cause LLM judges to approve actions they should block. The attack surface is the model's judgment itself.

The solution is to remove judgment from the enforcement step entirely.

***

### How automated reasoning works in ICME PreFlight

Automated reasoning is the field of computer science concerned with making decisions by formal logical deduction rather than statistical inference. Instead of asking "does this seem allowed?", it asks "can it be proven that this is allowed?"

We implement the approach described in the ARc paper (Automated Reasoning Checks). The process has three stages.

#### 1. Formalization

Your natural language policy is translated into SMT-LIB, a standardized formal logic language used by mathematical solvers. This is the only step that uses a language model — and it is a narrow, well-defined task that LLMs perform reliably. The output is a set of logical constraints that precisely encode your rules.

#### 2. Consistency checking

Before your policy is saved, the compiler checks it for internal contradictions. Conflicting rules, unreachable states, and logical impossibilities are caught at compile time. A policy that passes this step is guaranteed to be internally consistent.

#### 3. Solver evaluation

When an agent action arrives, the relevant values are extracted and passed to an SMT solver alongside your compiled policy. The solver determines with mathematical certainty whether the action satisfies the constraints. It returns `SAT` (satisfiable — the action is permitted) or `UNSAT` (unsatisfiable — the action is blocked).

The solver has no language understanding, no context window, and no judgment. It cannot be social engineered. Soundness exceeds 99% on unseen adversarial datasets and does not degrade when someone is actively trying to bypass it.

If you are new to solver outputs, read [Understanding SAT / UNSAT](/documentation/learning/how-icme-preflight-works/understanding-sat-unsat.md).

***

### What formal verification can enforce for AI agents

The solver is mathematically sound on anything that can be expressed as a fact in the action text. This covers the majority of real-world threats.

**What PreFlight enforces with mathematical certainty:**

*Crypto and DeFi*

* Transfer and spending limits — amounts are in the action text and checked exactly
* Recipient and contract registry rules — addresses are facts, not assertions
* Daily aggregate limits — arithmetic constraints the solver evaluates precisely
* Blocked contract interaction types — flash loans, multi-step sequences, unlisted contracts
* Social engineering and manipulation tactics — urgency appeals, emotional framing, and false authority claims are named as explicit boolean variables and evaluated against the action text

*Ecommerce*

* Order value caps and approval thresholds
* Approved vendor and supplier registry enforcement
* Refund and return policy rules — amounts, timeframes, eligibility conditions
* Spending limits per category, per session, or per day

*Personal agents*

* Data access scope — what the agent is permitted to read, summarize, or act on
* Network call destinations — raw IP patterns and unlisted endpoints are blocked
* Retention limits — data used only for the duration of the task

*Privacy compliance*

* Enforcing rules from HR documents, employment contracts, or data protection policies against agent actions
* Data transmission rules — approved endpoints only
* Access controls derived directly from your policy document, formalized automatically

**The one thing to understand about prompt injection:**

The key distinction is **what** versus **why**. Policy variables that describe *what the agent is doing* are injection-resistant. Variables that describe *why the agent believes it is permitted to act* are not.

PreFlight evaluates an agent's proposed action before it executes. For any variable that describes what the agent is doing — amounts, addresses, destinations, data scope — injected content in an email or document cannot change what the action says. An email telling the agent to send 10,000 USDC to an unlisted address produces an action PreFlight evaluates like any other. The amount and recipient are checked against the policy. If they violate it, the action is blocked.

The narrow exception is variables that ask the agent to self-report why it believes it is permitted to act — for example, whether an instruction came from a direct user prompt. These rely on the agent's own characterization, which injected content can influence.

The defense is straightforward: write policies around what the agent is doing. Amount, recipient, destination, data scope, retention — these are verifiable facts in the action text that injected content cannot overwrite. They are the right foundation for a PreFlight policy.

For a full breakdown of threat categories and policy design, see [What ICME PreFlight defends against](/documentation/learning/how-icme-preflight-works/what-icme-preflight-defends-against.md).

***

### Dual verification before an action is cleared

PreFlight does not rely on a single evaluation path. Every action check runs two independent verifications and reconciles them before returning a result.

A custom LLM check on our end evaluates the action against the formal SMT model of your policy. The cloud reasoning engine independently evaluates the same action against the same policy. A `SAT` result requires both to agree. If either returns `UNSAT`, the action is blocked.

This means a single extraction error or a single reasoning gap does not produce a false clearance. Both paths have to agree on `SAT` before your agent is allowed to proceed.

***

### How zero-knowledge proofs verify the guardrail result

The ARc approach tells you whether an action was allowed or blocked. zkML tells you that the decision was made correctly, by the specific model and policy you specified, without revealing either.

Zero knowledge proofs are cryptographic constructions that allow one party to prove a statement is true to another party without revealing any information beyond the truth of the statement itself. In the context of machine learning, this means proving that a specific model produced a specific output on a specific input, without exposing the model weights, the input data, or the intermediate computation.

Applied to guardrails, this gives you three properties that prompt-based systems cannot provide:

**Succinctness.** A ZK proof of a guardrail decision verifies in under one second on any device, regardless of how long the original computation took. You do not need to re-run the model to trust the result.

**Privacy.** Your policy is never exposed to the verifier. A third party can confirm that a decision was made correctly against your rules without ever seeing what your rules are.

**Tamper-evident audit trails.** Every decision produces a cryptographic receipt. These receipts are permanent, verifiable by anyone, and impossible to forge or retroactively alter. For regulated industries or high-stakes agent deployments, this is the difference between a log you hope no one modified and a proof that cannot be modified.

***

### How ICME PreFlight works end to end

```
Plain English Policy
        ↓
   LLM (translation only)
        ↓
   SMT-LIB Formal Logic
        ↓
Agent Action → SMT Solver → SAT / UNSAT
                                ↓
                          ZK Proof
                                ↓
                    Cryptographic Receipt
```

The language model handles translation. The solver handles enforcement. The ZK proof handles verification. Each component does only what it is suited for.

The result is a guardrail that cannot be argued out of a decision, produces verifiable proofs of every outcome, and keeps your policy private while remaining fully auditable.

The combination of ARc-based enforcement with zkML proofs is the novel contribution described in our paper *Succinctly Verifiable Agentic Guardrails With ZKP Over Automated Reasoning*. Where ARc solves enforcement, zkML solves verification at scale — enabling any machine to confirm a guardrail decision was made correctly without re-executing the proof, without trusting the provider, and without seeing the policy.

***

### FAQ

#### What makes ICME PreFlight different from prompt-based AI guardrails?

Prompt-based guardrails ask a model to interpret whether an action is allowed. ICME PreFlight compiles policy into formal logic and checks it with a solver. One is judgment. The other is proof.

#### Does ICME PreFlight stop prompt injection?

It is resistant to prompt injection when the policy is written around verifiable facts in the action. Examples include amounts, recipients, destinations, time windows, and data scope. Those facts are checked directly against the policy constraints.

#### What do `SAT` and `UNSAT` mean?

`SAT` means the action satisfies the policy constraints. `UNSAT` means it violates them and is blocked. For more detail, read [Understanding SAT / UNSAT](/documentation/learning/how-icme-preflight-works/understanding-sat-unsat.md).

#### Why add zero-knowledge proofs to AI agent security?

They let any third party verify a guardrail decision without seeing the private policy or rerunning the full computation. That gives you public verifiability, privacy, and tamper-evident audit trails.