The fundamental problem
Why system prompts are not policies
Enterprise AI teams spend a lot of time writing system prompts. "You are a helpful assistant. Do not share competitor information. Never reveal customer data. If asked to do something harmful, refuse." These instructions are reasonable. They are also not security controls.
A system prompt is part of the model's context. It influences the model's behaviour. It does not enforce it. A skilled attacker with a jailbreak, an indirect prompt injection through a retrieved document, or a multi-step manipulation sequence can override a system prompt instruction without the model flagging it as a violation. The model genuinely believes it is being helpful.
AgentIQ policies run outside the model's context. They evaluate inputs before the model sees them and outputs before they reach the caller. The model cannot override a policy by being convinced to. A prompt injection that makes the model want to call a restricted tool is stopped at the tool-call layer before the call executes.
System prompts operate inside the trust boundary of the model. AgentIQ policies operate outside it. An attacker who compromises the model's reasoning cannot bypass the policy engine. The policy engine does not ask the model whether a call should be allowed. It evaluates independently.
Architecture
How the AgentIQ Policy Engine works
The Policy Engine sits between your application and the model. Every interaction passes through it. It has four evaluation points:
The engine also collects telemetry on every evaluation. When telemetry is enabled in MirrorConfig, every policy decision is recorded with timestamp, input hash, rules that fired, and outcome. This is the audit trail.
Language reference
Mirror Policy DSL basics
Every policy file has the same structure. Version declaration first, then optional metadata headers, then one or more policy blocks.
@version "1.0.0";
@author "Security Team";
@last_modified "2026-04-10";
metadata {
description: "Policy for the enterprise internal assistant";
security_level: HIGH;
tags: ["production", "enterprise"];
}
policy my_policy {
// deny rules block the interaction when condition is true
deny message input where check_pii() == true;
deny message output where check_pii() == true;
// allow rules permit after a deny (allowlist pattern)
allow tool_call where function.name == "safe_search";
deny tool_call where true; // deny everything else
// check statements evaluate output quality
check_output hallucination with { threshold: 0.85 };
}
Three things matter about the DSL syntax. Operators are C-style: use &&, ||, ! not Python's and, or, not. Strings use double quotes only. Every statement ends with a semicolon. The policy will not compile if you get any of these wrong.
| Resource | Evaluates | Use for |
|---|---|---|
message input | User's message before the model sees it | PII detection, injection detection, jailbreak, length limits |
message output | Model's response before it reaches the caller | PII leakage, hallucination, toxicity, bias |
tool_call | Function call initiated by the model | Restrict file operations, block dangerous SQL, SSRF prevention |
tool_output | Result returned by a tool | PII in tool results, content from untrusted sources |
rag | Retrieved context before injection | Source verification, context manipulation, data poisoning |
embedding | Embedding vectors | Tamper detection, semantic drift |
Layer 1
Input protection layer
Input protection runs before the model. It is the cheapest layer to enforce and the one that stops the most attacks. Every production agent needs at minimum: PII detection, prompt injection detection, jailbreak detection, and length limits.
@version "1.0.0";
policy input_protection {
// Reject empty inputs
deny message input where length(content) == 0;
// Reject inputs over 10,000 characters
// Prevents token flooding and context stuffing attacks
deny message input where length(content) > 10000;
// Block PII: SSNs, credit cards, phone numbers in input
// Prevents accidental data ingestion into AI context
deny message input where check_pii() == true;
// Block prompt injection attempts
deny message input where check_prompt_injection() == true;
// Block jailbreak attempts
deny message input where detect_jailbreak() == true;
// Higher confidence injection blocking for production
check_prompt injection with { threshold: 0.9, enabled: true };
}
The default prompt injection threshold is 0.5. At that level you will see false positives on legitimate complex queries. For a general enterprise assistant, 0.7 to 0.8 is the right starting point. If your agent handles a specific domain with predictable inputs, you can push to 0.9. Start conservative and tune based on blocked legitimate queries surfaced in telemetry.
Layer 2
Output protection layer
Output protection runs after the model generates a response but before it reaches the caller. This is where you catch PII the model might have included from retrieved context, hallucinated facts, and toxic content.
@version "1.0.0";
policy output_protection {
// Block PII in output
// Customer data from retrieved context must not appear in responses
deny message output where check_pii() == true;
// Detect hallucination
// 0.85 threshold: blocks responses with significant ungrounded claims
check_output hallucination with { threshold: 0.85 };
// Verify factual consistency with retrieved context
check_output factual_consistency;
// Block toxic and biased content in customer-facing responses
check_output toxicity;
check_output bias;
// Block code injection patterns in output
// Prevents the model from generating malicious scripts
check_output code_injection;
// Monitor for prompt reflection
// Catches cases where injected instructions appear in output
check_output prompt_reflection;
}
Layer 3
Tool-call security policies
Tool-call policies are the most powerful layer for agentic systems. They intercept the model's function calls before execution. This is where you enforce least privilege on the agent: it can only call what you explicitly allow, and only within the parameters you define.
File operation controls
@version "1.0.0";
policy file_security {
// Block access to system files and credentials
deny tool_call where
function.name == "read_file" &&
starts_with(function.arguments, "/etc/");
deny tool_call where
function.name == "read_file" &&
contains(function.arguments, ".ssh/");
deny tool_call where
function.name == "read_file" &&
contains(function.arguments, ".env");
// Allow reads from the designated safe directory only
allow tool_call where
function.name == "read_file" &&
starts_with(function.arguments, "/workspace/data/");
// Deny any read_file call not in the allowlist above
deny tool_call where function.name == "read_file";
// Block all write operations outright
deny tool_call where function.name == "write_file";
deny tool_call where function.name == "delete_file";
}
Database and network controls
@version "1.0.0";
policy db_network_security {
// Block SQL injection patterns
deny tool_call where
function.name == "execute_sql" &&
(icontains(function.arguments, "OR 1=1") ||
icontains(function.arguments, "UNION SELECT") ||
icontains(function.arguments, "DROP TABLE") ||
icontains(function.arguments, "DELETE FROM") ||
icontains(function.arguments, "--"));
// Block SSRF: prevent access to internal services
deny tool_call where
function.name == "http_request" &&
contains(function.arguments, "localhost");
deny tool_call where
function.name == "http_request" &&
contains(function.arguments, "127.0.0.1");
deny tool_call where
function.name == "http_request" &&
(contains(function.arguments, "192.168.") ||
contains(function.arguments, "10.0.") ||
contains(function.arguments, "172.16."));
// Allow HTTP only to external HTTPS endpoints
allow tool_call where
function.name == "http_request" &&
starts_with(function.arguments, "https://");
deny tool_call where function.name == "http_request";
// Check tool output for PII from external sources
deny tool_output where detect_pii(tool_output.content) == true;
}
In the file_security policy above, reads that do not match the allowlist are explicitly denied. This is the correct pattern. An agent that can call read_file on any path it reasons is appropriate is an agent that a prompt injection attack can direct to read your secrets. Always enumerate what is allowed and deny everything else.
Layer 4
RAG source verification
If your agent uses retrieval, the retrieved content is an attack surface. Context manipulation attacks embed adversarial instructions in documents that get retrieved into the model's context. Data poisoning plants malicious content in your vector store. These policies run on the RAG context before it is injected.
@version "1.0.0";
policy rag_security {
// Verify retrieved documents are from trusted sources
check_rag source_verification;
// Detect if context has been manipulated to contain instructions
check_rag context_manipulation;
// Detect data poisoning in the retrieval pipeline
check_rag data_poisoning;
// Validate retrieval quality and integrity
validate_retrieval {
source_authenticity;
content_relevance;
chunk_integrity;
};
}
policy embedding_security {
// Detect embedding tampering and poisoning attacks
check_rag embedding_attack;
validate_retrieval {
embedding_similarity;
vector_consistency;
semantic_drift;
};
}
Composition
Production security chain
In production you do not apply policies one at a time. You compose them into a chain. A chain groups related policies and applies them in order. If any policy in the chain denies, the call is blocked.
@version "1.0.0";
@author "Security Team";
@last_modified "2026-04-10";
metadata {
description: "Production security chain for the enterprise internal AI assistant";
security_level: CRITICAL;
tags: ["production", "enterprise", "internal-assistant"];
}
chain enterprise_agent_security {
policy input_layer {
deny message input where length(content) == 0;
deny message input where length(content) > 10000;
deny message input where check_pii() == true;
deny message input where check_prompt_injection() == true;
deny message input where detect_jailbreak() == true;
}
policy tool_layer {
// File access: safe directory only
allow tool_call where
function.name == "read_file" &&
starts_with(function.arguments, "/workspace/data/");
deny tool_call where function.name == "read_file";
// SQL: block destructive patterns
deny tool_call where
function.name == "execute_sql" &&
(icontains(function.arguments, "DROP") ||
icontains(function.arguments, "DELETE FROM") ||
icontains(function.arguments, "UNION SELECT"));
// HTTP: external HTTPS only
allow tool_call where
function.name == "http_request" &&
starts_with(function.arguments, "https://");
deny tool_call where function.name == "http_request";
// Tool output: no PII from external sources
deny tool_output where detect_pii(tool_output.content) == true;
}
policy rag_layer {
check_rag source_verification;
check_rag context_manipulation;
check_rag data_poisoning;
}
policy output_layer {
deny message output where check_pii() == true;
check_output hallucination with { threshold: 0.85 };
check_output toxicity;
check_output bias;
check_output code_injection;
}
}
Deployment option 1
Deploy with the policy_monitor decorator
The decorator is the fastest path to enforcement. One import, one decorator above your function, done. The named policy is evaluated on every call.
from mirror_sdk.ops.mirror_decorators import policy_monitor
from mirror_sdk.core.mirror_core import MirrorConfig
import openai
config = MirrorConfig.from_env()
# The policy named here must exist in AgentIQ
# Upload enterprise_agent.policy via PolicyAPIService first
@policy_monitor(name="enterprise_agent_security", mirror_config=config)
async def run_agent(user_query: str) -> str:
"""
This function is wrapped by the policy engine.
The policy evaluates user_query before this code runs.
The return value is checked by the output layer before it reaches the caller.
If any policy rule fires, this function is not called (input deny)
or the return value is blocked (output deny).
"""
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an internal enterprise assistant."},
{"role": "user", "content": user_query}
]
)
return response.choices[0].message.content
# Usage
import asyncio
async def main():
result = await run_agent("Summarise last quarter's performance metrics")
print(result)
asyncio.run(main())
If an input deny rule fires, run_agent is never called. The caller receives a policy violation response. If an output deny rule fires, the return value from run_agent is blocked before it reaches the caller. In both cases, the event is recorded in the AgentIQ telemetry log with the rule that fired.
Deployment option 2
Deploy with PolicyAPIService
The programmatic API is for teams that need to manage policies dynamically: updating rules without code changes, building policy management tooling, or pushing emergency updates to a deployed agent.
import asyncio
from mirror_sdk.ops.mirror_agentiq_policy_api import PolicyAPIService, PolicyCreate
from mirror_sdk.core.mirror_core import MirrorConfig
config = MirrorConfig.from_env()
policy_service = PolicyAPIService(config)
async def deploy_production_policy():
# Read the policy DSL file
with open("enterprise_agent.policy") as f:
policy_text = f.read()
# Create and save the policy
new_policy = PolicyCreate(
policy_name="enterprise_agent_security",
policy_text=policy_text
)
saved = await policy_service.save_policy(new_policy)
print(f"Policy saved: {saved['_id']}")
# Deploy it (makes it active immediately)
await policy_service.deploy_policy(saved["_id"])
print("Policy deployed. All @policy_monitor calls referencing this name now use the new version.")
async def list_active_policies():
policies = policy_service.get_all_deployed_policies()
print(f"Active policies: {len(policies)}")
for p in policies:
print(f" {p['policy_name']} - deployed at {p.get('deployed_at', 'unknown')}")
async def emergency_policy_update(new_rule: str):
"""
Push an emergency rule update without redeploying the agent.
Useful when a new attack pattern is discovered in production.
"""
# Save updated policy and deploy immediately
updated_policy = PolicyCreate(
policy_name="enterprise_agent_security",
policy_text=new_rule
)
saved = await policy_service.save_policy(updated_policy)
await policy_service.deploy_policy(saved["_id"])
print("Emergency policy update deployed without agent restart.")
asyncio.run(deploy_production_policy())
Before going live
Testing and telemetry
Before deploying a policy to production, test it in the Policy Workbench: Portal then AgentIQ then Policy Manager then Policy Playground. The Playground lets you send test inputs and see exactly which rules fire and why, without affecting your live system.
In production, enable telemetry to get a structured audit trail of every policy decision.
from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
# Telemetry must be enabled in config for audit trail collection
config = MirrorConfig(
api_key="your-api-key",
server_url="https://mirrorapi.azure-api.net/v1",
telemetry_enabled=True,
policy_eval_enabled=True,
# Max retries on transient failures
max_retries=3,
# How often the SDK polls for policy updates (seconds)
polling_interval=300
)
sdk = MirrorSDK(config)
# With telemetry enabled, every policy evaluation is recorded:
# - timestamp
# - input hash (not the input itself)
# - policy name and version evaluated
# - rules that fired
# - outcome: allowed or blocked
# - latency
print("SDK configured with telemetry and policy eval enabled")
| Testing stage | Tool | Purpose |
|---|---|---|
| Policy authoring | Policy Workbench (natural language to DSL) | Generate policy draft from plain English; faster than writing DSL manually |
| Policy validation | Policy Playground | Send test inputs, verify rules fire correctly, catch syntax errors |
| Pre-production | DiscoveR quickScan on staging | Adversarial test of the full agent including policy enforcement |
| Production monitoring | AgentIQ telemetry | Audit trail of every policy decision; surface false positives for threshold tuning |
| Ongoing | DiscoveR quarterly scan | Verify policies still hold under evolving attack patterns |
Common questions
FAQ
Next: Sovereign AI deployment with FIPS 140-3 and air-gapped FHE
Module H2 covers deployments where data must never leave the organisation's physical control. Air-gapped FHE, FIPS 140-3 key management, and the architecture for running VectaX in environments with no external connectivity.