How It Works

The ICME API is built on two pieces of computer science: automated reasoning and zero knowledge machine learning. This page explains both and how they fit together.

The problem with existing guardrails

Most AI agent guardrails work by asking a model to evaluate whether an action is allowed. The model reads the policy, reads the action, and returns a judgment.

This has a fundamental weakness. The model that enforces the policy is the same kind of system as the agent trying to bypass it. Any input that can influence the agent can potentially influence the guardrail. Researchers have demonstrated repeatedly that sufficiently creative prompts can cause LLM judges to approve actions they should block. The attack surface is the model's judgment itself.

The solution is to remove judgment from the enforcement step entirely.

Automated Reasoning (ARc)

Automated reasoning is the field of computer science concerned with making decisions by formal logical deduction rather than statistical inference. Instead of asking "does this seem allowed?", it asks "can it be proven that this is allowed?"

We implement the approach described in the ARc paper (Automated Reasoning Checks). The process has three stages.

1. Formalization

Your natural language policy is translated into SMT-LIB, a standardized formal logic language used by mathematical solvers. This is the only step that uses a language model — and it is a narrow, well-defined task that LLMs perform reliably. The output is a set of logical constraints that precisely encode your rules.

2. Consistency checking

Before your policy is saved, the compiler checks it for internal contradictions. Conflicting rules, unreachable states, and logical impossibilities are caught at compile time. A policy that passes this step is guaranteed to be internally consistent.

3. Solver evaluation

When an agent action arrives, the relevant values are extracted and passed to an SMT solver alongside your compiled policy. The solver determines with mathematical certainty whether the action satisfies the constraints. It returns SAT (satisfiable — the action is permitted) or UNSAT (unsatisfiable — the action is blocked).

The solver has no language understanding, no context window, and no judgment. It cannot be social engineered. Soundness exceeds 99% on unseen adversarial datasets and does not degrade when someone is actively trying to bypass it.

Zero Knowledge Machine Learning (zkML)

The ARc approach tells you whether an action was allowed or blocked. zkML tells you that the decision was made correctly, by the specific model and policy you specified, without revealing either.

Zero knowledge proofs are cryptographic constructions that allow one party to prove a statement is true to another party without revealing any information beyond the truth of the statement itself. In the context of machine learning, this means proving that a specific model produced a specific output on a specific input, without exposing the model weights, the input data, or the intermediate computation.

Applied to guardrails, this gives you three properties that prompt-based systems cannot provide:

Succinctness. A ZK proof of a guardrail decision verifies in under one second on any device, regardless of how long the original computation took. You do not need to re-run the model to trust the result.

Privacy. Your policy is never exposed to the verifier. A third party can confirm that a decision was made correctly against your rules without ever seeing what your rules are.

Tamper-evident audit trails. Every decision produces a cryptographic receipt. These receipts are permanent, verifiable by anyone, and impossible to forge or retroactively alter. For regulated industries or high-stakes agent deployments, this is the difference between a log you hope no one modified and a proof that cannot be modified.

How they fit together

Plain English Policy
        ↓
   LLM (translation only)
        ↓
   SMT-LIB Formal Logic
        ↓
Agent Action → SMT Solver → SAT / UNSAT
                                ↓
                          ZK Proof
                                ↓
                    Cryptographic Receipt

The language model handles translation. The solver handles enforcement. The ZK proof handles verification. Each component does only what it is suited for.

The result is a guardrail that cannot be argued out of a decision, produces verifiable proofs of every outcome, and keeps your policy private while remaining fully auditable.

The combination of ARC-based enforcement with zkML proofs is the novel contribution described in our paper Succinctly Verifiable Agentic Guardrails With ZKP Over Automated Reasoning. Where ARC solves enforcement, zkML solves verification at scale — enabling any machine to confirm a guardrail decision was made correctly without re-executing the proof, without trusting the provider, and without seeing the policy.

PreviousQuickstart NextWriting Effective Policies

Last updated 0 minutes ago

Good evening

hashtagThe problem with existing guardrails

hashtagAutomated Reasoning (ARc)

hashtagZero Knowledge Machine Learning (zkML)

hashtagHow they fit together

The problem with existing guardrails

Automated Reasoning (ARc)

Zero Knowledge Machine Learning (zkML)

How they fit together