# Verifiable API Discovery: How Buyer Agents Find the Right Tool

AI agents are choosing their own tools at runtime, and tool selection accuracy collapses as the menu grows. Researchers behind the [RAG-MCP project](https://www.agentpmt.com/articles/thousands-of-mcp-tools-zero-context-left-the-bloat-tax-breaking-ai-agents) measured a drop from 43% to under 14% when agents faced bloated tool sets. The [Speakeasy team](https://www.news.aakashg.com/p/master-ai-agent-distribution-channel) found that reducing Playwright's MCP server from 26 tools to 8 dramatically improved agent accuracy. GitHub [cut its Copilot MCP server](https://dev.to/aws-heroes/mcp-tool-design-why-your-ai-agent-is-failing-and-how-to-fix-it-40fc) from 40 tools to 13 after observing measurable benchmark regressions. Block rebuilt its Linear MCP server three times, going from 30+ tools down to 2.

MCP registries now list thousands of tools across hundreds of servers. When an agent needs a task management tool, it discovers dozens of candidates with overlapping names and near-identical descriptions: `create_issue`, `create_ticket`, `create_card`, `add_task`. Research on MCP tool selection shows agents start failing at 30+ tools when descriptions overlap, and virtually guarantee wrong picks at 100+. Today's solution is semantic search: an LLM reads descriptions and picks the best match. This is the same model-judging-model pattern that fails for every other guardrail.

The problem gets worse when you add incentives. In an agent marketplace, seller agents advertise tools to buyer agents. Sellers are optimizing for selection. A Trello agent might emphasize "project management" when the buyer needs "engineering task tracking." A Jira agent might bury the fact that it uses API key auth instead of OAuth. The descriptions aren't wrong, they are just optimized. Semantic search cannot tell the difference between a tool that fits and a tool that sounds like it fits.

ICME PreFlight replaces semantic matching with formal verification. The buyer compiles capability requirements into a private policy. Each seller advertises its tool's features. The SMT solver checks whether the advertised capabilities satisfy the buyer's constraints. SAT means the tool has everything the buyer needs. UNSAT means it doesn't. The seller never sees which rules exist or which one it failed.

***

#### Attack surface

Unlike traditional API selection where a developer evaluates documentation, agentic tool selection happens autonomously at machine speed. The agent reads tool descriptions, evaluates fit, and commits, all without human review. The failure modes are different from prompt injection. They are procurement failures.

| Vector              | Example                                                                                                                                                                              |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Description overlap | Five tools all say "create tasks with labels and assignees." The agent picks the one with the highest semantic similarity score, which happens to be the wrong platform.             |
| Feature inflation   | A seller describes its tool as supporting "project scoping" when it only supports flat lists. The description is technically not false, but it does not match what the buyer needs.  |
| Auth mismatch       | The buyer requires OAuth 2.0. The seller's tool uses API keys. The description says "authenticated access" without specifying the method. Semantic search treats both as equivalent. |
| Capability gap      | A tool supports reading issues but not creating them. The seller's description says "issue management." The agent selects it for a write operation and fails at runtime.             |
| Unverified seller   | An unknown agent advertises a tool with perfect capabilities. There is no identity verification. The tool may not exist, may not work, or may be a front for data collection.        |

***

#### Why semantic search does not catch this

Semantic search ranks tools by how similar their descriptions are to the query. It does not verify whether the tool actually has the required capabilities. A tool described as "create and manage engineering tasks with full project support" scores higher than "create issues in Linear with label and assignee support via OAuth 2.0," even though the second one is the better match for a buyer that needs OAuth, labels, and Linear-compatible workflows.

The fundamental issue: similarity is not satisfiability. Two descriptions can be semantically close while describing tools with completely different capabilities. And when sellers know how the ranking works, they optimize their descriptions for the ranking, not for accuracy.

PreFlight compiles the buyer's requirements to formal logic and checks each seller's advertised capabilities against a mathematical solver. The solver does not evaluate how similar a description sounds. It checks whether the stated capabilities satisfy the constraints. A tool that says "authentication method is API key" returns UNSAT against a policy requiring OAuth 2.0, regardless of how the rest of the description is worded.

***

This use case has three properties that make it different from the standard guardrail pattern:

**No product names in the policy.** The buyer does not say "must be Linear" or "must be Jira." It says "must support OAuth 2.0, must support labels, must support assigning tasks, must support creating new items, must support project scoping." If Linear satisfies those requirements, Linear wins. If Jira also satisfies them, Jira also wins. Best capabilities win, not brand names.

**Sellers do not know the policy.** They advertise their tool's features without knowing which features the buyer cares about. They cannot game the policy because they cannot see it. ZK proofs guarantee this privacy: the seller learns SAT or UNSAT, but never which rules exist or which one it failed on.

**SAT receipts are contracts.** When a seller's bid returns SAT, the receipt is a cryptographic record of exactly what the seller claimed. If the tool does not deliver the advertised capabilities, the receipt is proof of what was promised. The seller cannot forge the receipt. The buyer cannot forge a rejection. Honesty becomes the dominant strategy because lying creates provable liability with no information advantage.

### Prerequisites

Before compiling a policy or checking actions, create an account and top up credits. Account creation costs $5 and includes 325 credits to get started. Compiling a policy costs 300 credits ($3.00, one-time). Each checkIt call costs 1 credit ($0.01).

```
# Create an account (opens a checkout page for card payment)
curl -s -X POST https://api.icme.io/v1/createUserCard \
  -H 'Content-Type: application/json' \
  -d '{"username": "your-username"}'
```

Open the returned `checkout_url` in your browser to complete payment. Save the `api_key` from the response. For full setup options including USDC payment, see the quickstart (for crypto x402): <https://docs.icme.io/documentation/getting-started/quickstart>

### Buyer policy

The buyer's requirements are pure capability constraints. No vendor names. No brand preferences.

```
curl -s -N -X POST https://api.icme.io/v1/makeRules \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy": "Rule 1: The tool is permitted only if the tool supports assigning tasks to team members and the tool supports labels or tags for categorization and the tool authentication method is OAuth 2.0 and the tool supports creating new items and the tool supports scoping work to a project or team and the seller has a verified identity.\nRule 2: If the tool does not support assigning tasks to team members, then the tool is not permitted and the bid must be rejected.\nRule 3: If the tool does not support labels or tags for categorization, then the tool is not permitted and the bid must be rejected.\nRule 4: If the tool authentication method is not OAuth 2.0, then the tool is not permitted and the bid must be rejected.\nRule 5: If the tool does not support creating new items, then the tool is not permitted and the bid must be rejected.\nRule 6: If the tool does not support scoping work to a project or team, then the tool is not permitted and the bid must be rejected.\nRule 7: If the seller does not have a verified identity, then the tool is not permitted and the bid must be rejected."
  }'
```

Save the returned policy\_id. Pass it on every checkIt call.

**Note on compilation time:** makeRules compiles your natural language policy into SMT-LIB2 formal logic, generates a Automated Reasoning policy, and fetches adversarial scenarios for battle testing. This involves an LLM translation pass, an SMT compilation step, and a build workflow that polls until the reasoning model is ready. The full pipeline typically completes in 2 to 7 minutes depending on rule count and queue depth. This is a one-time cost. You compile once and check against the same policy\_id for every subsequent seller bid. Recompile only when your requirements change. makeRules gets you up to 99%, battle testing takes you all of the way.

### Battle testing the policy (optional but recommended)

A policy that compiles may still have ambiguous terms or definitions that are uncertain. You can battle test policies to get them up to 100% coverage for all terms. The makeRules response includes generated scenarios: variable assignments that are logically possible given your rules, sorted to surface the combinations most likely to reveal a wrong or ambiguous rule first.

Review each scenario and variable and ask: could this situation actually happen in my domain? If the scenario looks correct, submit thumbs-up feedback. If the scenario should be impossible given your policy, submit thumbs-down feedback with an annotation explaining why. The annotation is what PreFlight uses to deduce the rule or variable change needed. Name the variables, name the values, name the rule being violated.

```
# Get scenarios for your policy
curl -s https://api.icme.io/v1/policy/YOUR_POLICY_ID/scenarios \
  -H "X-API-Key: $ICME_API_KEY" | jq .

# Thumbs up: scenario looks correct
curl -s -X POST https://api.icme.io/v1/submitScenarioFeedback \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "guard_content": "PASTE THE SCENARIO DESCRIPTION",
    "approved": true
  }'

# Thumbs down: scenario should be impossible
curl -s -X POST https://api.icme.io/v1/submitScenarioFeedback \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "guard_content": "PASTE THE SCENARIO DESCRIPTION",
    "approved": false,
    "annotation": "Explain why this scenario is wrong. Name the variables, values, and rule being violated."
  }'
```

If you submitted any thumbs-down feedback, call refinePolicy to rebuild the policy in place with your annotations applied. This is an SSE stream that typically takes a few minutes. It batches all pending annotations into a single rebuild and returns fresh scenarios for the next review round. Your policy\_id stays the same throughout.

```
# Rebuild with annotations
curl -s -N -X POST https://api.icme.io/v1/refinePolicy \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID"}'

# Confirm all test cases pass
curl -s -X POST https://api.icme.io/v1/runPolicyTests \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "YOUR_POLICY_ID"}'
```

Full documentation: [https://docs.icme.io/documentation/learning/battle-testing](/documentation/learning/battle-testing.md)

### Check every seller bid before the agent commits

SAT = tool has all required capabilities. UNSAT = it is missing something. Every decision returns a cryptographic receipt.

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "YOUR_SELLER_BID"
  }'
```

#### Pre-filter bids with free relevance screening

In a marketplace with hundreds of seller bids, checking every one at $0.01 adds up. `checkRelevance` screens each bid for free before you spend a credit on `checkIt`.

```bash
curl -s -X POST https://api.icme.io/v1/checkRelevance \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "Seller agent advertises an MCP tool. The tool supports labels for categorization. The tool authentication method is API key. The tool does not support assigning tasks."
  }'
```

````json
{
  "relevance": 0.18,
  "matched_variables": 3,
  "total_variables": 17,
  "matched": ["toolSupportsLabelsOrTagsForCategorization", "toolAuthenticationMethod", "toolSupportsAssigningTasksToTeamMembers"],
  "should_check": true,
  "time_ms": 290
}
```

The tool only touches 18% of your policy variables. It explicitly lacks task assignment. You already know it will fail `checkIt`, but the free screen tells you which variables it matched and which it didn't, before you spend the credit.

Use relevance as a shortlisting threshold. In a marketplace with 200 bids:
```
200 seller bids
  → checkRelevance each (free)
  → relevance < 40% → skip, not enough capability overlap
  → relevance ≥ 40% → shortlist for full checkIt ($0.01 each)
````

Pass a threshold to automate this:

```bash
curl -s -X POST https://api.icme.io/v1/checkRelevance \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{
    "policy_id": "YOUR_POLICY_ID",
    "action": "SELLER_BID_TEXT",
    "threshold": 0.40
  }'
```

At `0.40`, only bids that touch at least 40% of your policy variables return `should_check: true`. A tool that mentions labels and auth but nothing about task assignment, item creation, or project scoping comes back at 18% and gets filtered out. A tool that covers five of six capability areas comes back at 70%+ and makes the shortlist.

This turns a 200-bid marketplace into a 15-bid shortlist at zero cost. You pay $0.15 for the shortlist instead of $2.00 for the full set. The savings scale with marketplace size.

The relevance score also gives you a rough signal before the formal check. A bid at 95% relevance that returns UNSAT is missing one specific thing. A bid at 40% relevance that returns UNSAT is missing half your requirements. The score doesn't replace the solver's verdict, but it tells you where to look when debugging a rejection.

### Live results

#### SAT: Linear create\_issue -- all capabilities match

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by user ID. The tool supports labels for categorization. The tool authentication method is OAuth 2.0. The tool supports creating new items. The tool supports scoping work to a team via team ID parameter. The seller has a verified identity."}'
```

```json
{
  "result": "SAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 0,
    "toolSupportsCreatingNewItems": true,
    "toolSupportsScopingWorkToProjectOrTeam": true,
    "sellerVerificationStatus": 0
  },
  "llm_result": "SAT",
  "ar_result": "SAT",
  "z3_result": "SAT",
  "zk_proof_id": "...",
  "zk_proof_url": "https://api.icme.io/v1/proof/..."
}
```

All six capability constraints are satisfied. Three independent solvers (the local OxiZ extraction model, Automated Reasoning, and Z3) all agree SAT. The buyer can proceed with this tool. The receipt records exactly what the seller claimed.

#### Understanding "AR uncertain" results

The Automated Reasoning layer may return "AR uncertain" on some checks. This means the AR engine found a valid interpretation of the action that satisfies the policy, but could not prove it is the only possible interpretation. This happens when policy variables use integer-encoded enums (e.g. toolAuthenticationMethod = 0 for OAuth 2.0) and the AR translator is not fully confident in the mapping from natural language to the integer value.

It is not an error. The system requires unanimous agreement from all three solvers before returning SAT. When AR is uncertain, the local OxiZ model and Z3 must both independently confirm SAT for the result to pass. The three-solver consensus provides the same effective confidence as a clean AR verdict. In practice, if all three agree, the result is reliable.

The battle testing cycle described above can reduce AR uncertainty. Submitting annotations that clarify variable descriptions and enum encodings (e.g. "toolAuthenticationMethod 0 = OAuth 2.0, 1 = API key") and then calling refinePolicy gives the AR translator more precise mappings to work with.

#### SAT: Jira create\_ticket -- also has all capabilities

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by account ID. The tool supports labels for categorization. The tool authentication method is OAuth 2.0. The tool supports creating new items. The tool supports scoping work to a project via project key. The seller has a verified identity."}'
```

```json
{
  "result": "SAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 0,
    "toolSupportsCreatingNewItems": true,
    "toolSupportsScopingWorkToProjectOrTeam": true,
    "sellerVerificationStatus": 0
  }
}
```

Jira also satisfies all capability constraints. The policy contains no brand preference, so both Linear and Jira are valid matches. The buyer now has a verified shortlist of two tools and can choose between them on other criteria: price, latency, existing integrations, team preference.

This is the point. No brand favoritism. Capabilities decide.

#### UNSAT: Trello create\_card -- API key auth, not OAuth 2.0

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by member ID. The tool supports labels for categorization. The tool authentication method is API key. The tool supports creating new items. The tool supports scoping work to a board via board ID parameter. The seller has a verified identity."}'
```

```json
{
  "result": "UNSAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 1,
    "toolSupportsCreatingNewItems": true,
    "toolSupportsScopingWorkToProjectOrTeam": true,
    "sellerVerificationStatus": 0
  }
}
```

Trello has five of six required capabilities but uses API key authentication. The solver identifies the single failing constraint. The seller learns UNSAT but not which rule it failed on. If Trello later adds OAuth 2.0 support, the same bid with updated auth would return SAT.

#### UNSAT: GitHub Issues -- no project scoping

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by username. The tool supports labels for categorization. The tool authentication method is OAuth 2.0. The tool supports creating new items. The tool does not support scoping work to a project or team. The seller has a verified identity."}'
```

```json
{
  "result": "UNSAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 0,
    "toolSupportsCreatingNewItems": true,
    "toolSupportsScopingWorkToProjectOrTeam": false,
    "sellerVerificationStatus": 0
  }
}
```

GitHub Issues supports OAuth, labels, assignees, and write operations. It fails on project scoping. A semantic search system would rank this tool highly because it matches five of six criteria. The solver does not rank. It returns UNSAT because the constraints are not fully satisfied.

#### UNSAT: Notion query\_database -- read-only, cannot create items

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by user mention. The tool supports labels for categorization via tags property. The tool authentication method is OAuth 2.0. The tool does not support creating new items. The tool supports scoping work to a project via database ID. The seller has a verified identity."}'
```

```json
{
  "result": "UNSAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 0,
    "toolSupportsCreatingNewItems": false,
    "toolSupportsScopingWorkToProjectOrTeam": true,
    "sellerVerificationStatus": 0
  }
}
```

Notion's query tool is read-only. The seller honestly advertises this. The solver catches the capability gap immediately. This prevents the common failure where an agent selects a query tool for a write operation and fails at runtime.

#### UNSAT: Unknown agent -- no verified identity

```
curl -s -N -X POST https://api.icme.io/v1/checkIt \
  -H 'Content-Type: application/json' \
  -H "X-API-Key: $ICME_API_KEY" \
  -d '{"policy_id": "POLICY_ID", "action": "Seller agent advertises an MCP tool. The tool supports assigning tasks to team members by user ID. The tool supports labels for categorization. The tool authentication method is OAuth 2.0. The tool supports creating new items. The tool supports scoping work to a team via team ID. The seller does not have a verified identity."}'
```

```json
{
  "result": "UNSAT",
  "extracted": {
    "toolSupportsAssigningTasksToTeamMembers": true,
    "toolSupportsLabelsOrTagsForCategorization": true,
    "toolAuthenticationMethod": 0,
    "toolSupportsCreatingNewItems": true,
    "toolSupportsScopingWorkToProjectOrTeam": true,
    "sellerVerificationStatus": 1
  }
}
```

Every required capability is present. It fails on seller identity alone. This is the sybil defense: even a perfect tool from an unverified seller is rejected. The buyer's policy can require any trust signal (verified identity, registry listing, reputation score) as a boolean constraint alongside capability requirements.

#### Reading the extracted variables

Every checkIt response includes an extracted map showing exactly what the solver evaluated. Boolean variables (toolSupportsAssigningTasksToTeamMembers, toolSupportsCreatingNewItems, etc.) are true or false. Enum variables use integer encoding: toolAuthenticationMethod is 0 for OAuth 2.0 and 1 for API key. sellerVerificationStatus is 0 for verified and 1 for unverified.

The response also includes three independent solver results. llm\_result is from the local OxiZ extraction model. ar\_result is from Automated Reasoning. z3\_result is from the Z3 SMT solver. The system requires consensus across all available solvers before returning a final verdict. A zk\_proof\_id and zk\_proof\_url are included for cryptographic verification of the result.

### Why two SAT results is the correct outcome

A guardrail that returns only one winner is a recommendation engine, not a verifier. PreFlight returns SAT for every tool that satisfies all constraints. In this example, both Linear and Jira pass. This is correct.

Now the buyer has a verified shortlist. Every tool on it provably meets the capability requirements. The buyer can select between them on secondary criteria that do not belong in a formal policy: team familiarity, pricing, existing integrations, response latency, or simple preference. The policy handles the hard constraints. The buyer handles the soft ones.

If the buyer wants to narrow the shortlist further, they add more rules and recompile. Each additional constraint costs nothing at check time. The policy grows more specific. The shortlist gets shorter. This is iterative procurement, not one-shot guessing.

### Trust model

PreFlight does not make sellers honest. It makes dishonesty provable and enforceable.

Sellers describe what they have. They do not know the buyer's requirements. They cannot optimize their description to hit specific criteria because they cannot see the criteria. The best strategy is to describe the tool's actual capabilities accurately.

**Buyer policy stays private.** ZK proofs guarantee this. The seller learns the result (SAT or UNSAT) but never which rules exist, how many there are, or which one it failed on.

**SAT receipts are cryptographic contracts.** The receipt records the seller's exact claims under a tamper-proof cryptographic receipt. If the tool does not deliver the advertised capabilities, the receipt is evidence of what was promised. The seller cannot later claim they never said OAuth 2.0 was supported.

**If a seller lies, the lie is recorded.** A seller that claims "authentication method is OAuth 2.0" when the tool actually uses API keys will receive SAT. But the receipt records that specific claim. When the buyer connects and the OAuth handshake fails, the receipt proves the seller misrepresented its tool. This is not a limitation. It is the enforcement mechanism. The solver does not detect lies at bid time. It creates the conditions under which lies have consequences.

**Defense in depth.** For high-stakes selections, buyers can add a verification layer between bid and commitment: pull the tool's schema directly from the MCP server, construct structured values from the schema, and check those values via /v1/verify (which takes key-value pairs, no LLM extraction). This catches misrepresentation before the buyer commits, without trusting the seller's description at all.

### Deploying in production

**Compile once.** Call makeRules with your capability requirements. Store the policy\_id in your environment. The compilation takes 2 to 7 minutes but is a one-time cost. Recompile only when your requirements change.

**Battle test before going live.** Review the generated scenarios, submit feedback, refine if needed, and run tests until all pass. This catches ambiguous variable mappings and rule conflicts before any real action is evaluated. Fixing problems at compile time costs a rebuild. Finding them in production costs a failed guardrail.

**Check every bid.** Call checkIt for each candidate tool the agent discovers. This is 1 credit ($0.01) per check and completes in under 12 seconds.

**Accept all SAT results as a shortlist.** Do not assume only one tool should pass. Multiple tools may satisfy your requirements. Let your agent select among verified candidates on secondary criteria.

**Store receipts.** Every check\_id is a cryptographic audit receipt. Store it alongside the tool selection decision. If the tool underperforms, the receipt documents what the seller claimed at selection time.

**Treat UNSAT as final.** Do not retry the same bid with rephrased descriptions. UNSAT means the advertised capabilities do not satisfy the requirements. If the seller updates its tool, a new bid with updated capabilities can be checked.

**Fail closed.** If the ICME API is unreachable or returns anything other than an explicit SAT, do not commit to the tool. An unavailable guardrail is not implicit permission.

**Iterate the policy.** If too many tools return SAT and the shortlist is too broad, add more constraints and recompile. If no tools return SAT, relax a constraint. The policy is your procurement spec. Treat it like one.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.icme.io/documentation/agentic-commerce/verifiable-api-discovery-how-buyer-agents-find-the-right-tool.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.