Module E1 of 5 · Track 3E: Security Operations for AI

Never trust. Always verify. And verify again at inference.

Zero Trust Architecture for AI Systems

Traditional zero trust was designed for humans and stable services. AI systems break both assumptions. This module covers the inference gap that standard zero trust misses, two paths to verifiable inference, the agent identity crisis, and how all four Mirror Security products map to the four trust planes of an AI deployment.

38 min read
Track 3E
Intermediate
Security Operations

Module Progress

1 2 3 4 5

Section 01

Why traditional zero trust falls short for AI

Zero trust is the right security model for modern infrastructure. It replaced the assumption that anything inside the network perimeter is safe with continuous verification of every request. For human users and stable microservices, it works well.

AI systems break the two key assumptions zero trust was built on. First, traditional ZT was designed for human identities: users who have fixed roles, stable attributes, and audit trails. AI agents are short-lived, task-scoped, spawn sub-agents dynamically, and operate at machine speed. The identity model does not fit. Second, traditional ZT treats computation as a black box: it verifies who can access data, but it does not verify what happens to data during computation. With AI, computation is where the security risk lives.

The result: organisations can have a fully compliant zero trust posture for their user access layer while simultaneously running AI inference that exposes sensitive data in plaintext to any privileged insider, and deploying AI agents that operate under shared credentials with far more access than any individual task requires.

Traditional zero trust: what it covers
Verify user identity before granting network access
Least-privilege access to databases and APIs
Encrypt data at rest and in transit
Micro-segmentation of network traffic
Does NOT verify what happens during computation
Does NOT handle short-lived, task-scoped agent identities
Does NOT track delegation across agent hierarchies
Does NOT enforce capability constraints per AI operation
AI-aware zero trust: what you also need
Verifiable inference: proof that data was not exposed during computation
Short-lived capability-scoped tokens per agent task, not shared credentials
Delegation lineage: which agent, under which principal, for which tenant
Runtime enforcement of policy at every request, not just at token issuance
Continuous adversarial testing of model behaviour between deployments
Inline guardrails on model outputs at inference time
Fast revocation: seconds, not hours, when an agent or token is compromised
Audit trails that link machine actions to agent instance, principal, and policy

Section 02

The inference gap

Most security-conscious AI deployments correctly implement encryption at rest and encryption in transit. Data on disk is encrypted. Data on the wire is encrypted. This is necessary. It is not sufficient.

At inference time, data must be decrypted. The model reads plaintext inputs. The context window contains plaintext documents. The embeddings are computed from plaintext text. For the duration of inference, this data exists as plaintext in memory. That plaintext is reachable by anyone or anything with access to the compute environment.

The inference gap in a standard AI pipeline

📜
Data at rest
AES-256
👥
Transit to model
TLS 1.3
😀
Inference (plaintext in RAM)
⚠ Exposed
👥
Response to client
TLS 1.3
📜
Stored output
AES-256
During inference: privileged insiders, compromised hosts, memory-scraping malware, crash dumps, misconfigured telemetry, and side-channel attacks can all reach plaintext data even in a fully compliant deployment.

This is not a hypothetical risk. Memory scraping malware specifically targets AI inference servers because they process high-value plaintext at predictable intervals. Crash dumps from inference servers have been found to contain sensitive patient data and financial records in healthcare and banking deployments. Misconfigured logging tools have captured plaintext context windows in plain text log files accessible to developers with no data access clearance.

The inference gap exists in every standard AI deployment. It is not a bug in any specific product. It is a structural consequence of how inference works: the model needs to see the data to process it. Closing the gap requires either encrypting what the model sees, or verifying that the environment is trustworthy even if the data is plaintext.

Compliance does not close the inference gap. A deployment can pass SOC 2, HIPAA, and GDPR audits while still exposing sensitive data in plaintext during inference. Access controls, encryption at rest, and TLS in transit are all audit-checkable. Plaintext exposure during inference is not directly audited by any of these frameworks. This is an emerging gap in AI-specific compliance requirements.

Section 03

Verifiable inference

Verifiable inference means the inference process provides a technical guarantee that data was not exposed in plaintext during computation. Two practical approaches exist. They differ in their trust assumptions and performance characteristics.

Confidential computing (enclaves)
Hardware-based, near-plaintext performance
Inference runs inside a hardware-isolated memory region (SGX, AMD SEV, ARM CCA)
Remote attestation proves to the data owner that inference ran in the expected measured environment
Operator cannot see workload data even with physical access to the server
Near-plaintext performance: overhead is typically 5 to 20 percent
Trust assumption: trusts the hardware boundary (CPU vendor) but not the infrastructure operator
Hardware vulnerabilities (Spectre, Meltdown variants) can undermine guarantees
Best for: high-throughput inference where FHE overhead is unacceptable
Fully homomorphic encryption (FHE)
Cryptographic, no hardware trust needed
Data stays encrypted even during computation: model operates on ciphertext
Plaintext never appears in memory, even under host compromise
No trust in the infrastructure operator required at all
Strongest cryptographic control: guaranteed by mathematics not hardware
Performance overhead: VectaX achieves 150-240 tok/s on 7B models vs 200-300 plaintext
No hardware-level side-channel attack surface
Best for: regulated inference on highly sensitive data where no operator trust is acceptable

VectaX implements the FHE path for vector database operations. Embeddings are encrypted using similarity-preserving FHE before reaching the vector store. Similarity search runs on encrypted vectors without decryption. Retrieved results are decrypted only at the authorised client. An attacker with access to the vector database cannot reconstruct document content from the stored ciphertexts. The data plane gap is closed at the retrieval layer.

Section 04

Sovereignty and the inference gap

In regulated industries, AI sovereignty is often treated as a geography question: keep data inside national borders, use domestic providers, host in local data centers. That approach helps. It does not produce real sovereignty.

The moment inference begins, the question stops being where the data sits and becomes who can see it while computation happens. A hospital can keep data inside EU borders and meet GDPR residency requirements. But if patient scans appear as plaintext during inference, a breach exposes the crown jewels regardless of which country the server is in. Geography does not protect plaintext.

Real sovereignty requires technical guarantees at compute time. Controls that hold even when systems fail, operators are compromised, or infrastructure is breached. Residency requirements, access controls, and audit trails are all necessary. None of them change what happens when inference requires plaintext in memory.

Access controls reduce who can reach a system. They do not eliminate the fact that the system processes plaintext. Audit trails tell you what happened after an incident. They do not prevent exposure during inference. If your sovereignty story ends at "where the server is," it ends right before the part that matters.

Mirror Security on sovereignty: "Sovereignty isn't a location claim. It's control, verifiable control, over what happens to your data when intelligence runs." Two technical approaches close the gap: confidential computing (enclaves and remote attestation for near-plaintext performance) and FHE (computation on ciphertext, no trust in environment needed). Same goal, different trust assumptions.

📋 Mirror Blog · Sovereignty Without Verifiable Inference Is a Mirage

Section 05

Four zero trust principles for AI

The four core zero trust principles apply to AI systems, but each one takes on an AI-specific meaning that goes beyond the traditional interpretation. Here is what each principle means when AI models and autonomous agents are in scope.

🔍
Verify explicitly
Classic: check identity and device health before granting access
For AI: verify which model version is running, which agent instance is acting, which delegation chain it is operating under, and whether the environment has been attested at every request. A valid user token does not mean the downstream agent is safe or behaving correctly.
🔓
Use least privilege
Classic: grant minimum permissions needed for the role
For AI: issue capability-scoped tokens per task, not broad service-account permissions. A customer support agent should have a token that can only issue refunds up to the original transaction value, not read all payment records. Every agent action gets the minimum capability for that specific operation, not a shared credential with broad access.
💥
Assume breach
Classic: design for a world where the perimeter is already breached
For AI: design for a world where any agent instance can be compromised. Short-lived tokens (minutes, not months) limit the blast radius of a compromise. Compartmentalised access means one compromised agent cannot pivot to other systems. Fast revocation (seconds) ensures a compromised credential window is narrow. Every agent action is audited with lineage, so forensics is possible.
🔄
Verify continuously
Classic: re-authenticate periodically, check device posture
For AI: enforce policy at every model output and every agent action, not just at token issuance. AgentIQ runs inline guardrails on every response: PII in outputs, hallucination, prompt injection, toxicity. DiscoveR runs continuous adversarial testing between and during deployments. A model that was safe at deployment may not be safe after fine-tuning or after a new attack technique is discovered.

Section 06

The agent identity crisis

AI agents are now operating across payment systems, ticketing systems, internal data stores, and SaaS control planes. But in too many deployments, access control still relies on shared long-lived secrets designed for static services. This pattern is an active operational risk, not a future concern.

Traditional IAM works well for humans and stable services. AI agents have fundamentally different characteristics that break the IAM model.

Shared credential model vs capability-scoped model

⚠ Shared service account (current state for most)
Identity: One service account shared across all agent instances
Lifetime: Static secret, lasts months or years
Scope: Broad API access (read, write, modify, delete across the service)
On compromise: Attacker has full service access indefinitely
Revocation: Rotating the credential breaks all agents using it simultaneously
Attribution: "The payment service agent" did it. No further granularity.
Delegation: Not tracked. No lineage across agent hierarchies.
✓ Zero trust agent identity (target state)
Identity: Unique identity per agent instance, per task, per tenant
Lifetime: Short-lived token, expires in minutes
Scope: Scoped to one operation (payments:refund for customer X, amount limit $50)
On compromise: Attacker holds a token expiring in minutes, for one bounded operation
Revocation: Revoke one token without affecting other agents
Attribution: Agent-73-instance-4, delegated by user Alice, under Tenant-B policy
Delegation: Full lineage tracked across parent and child agent hierarchy

The failure mode from shared credentials is predictable. Broad access where narrow access is needed. Weak attribution when incidents happen. Slow revocation during active response. Difficult compliance evidence for delegated machine actions. One compromise creates a disproportionate blast radius when identity and access boundaries are not workload-specific.

📋 Mirror Blog · Zero Trust for AI Agents: Solving Identity and Access with AgentID

Section 07

AgentID architecture

AgentID from Mirror Security applies zero trust principles to autonomous agent systems. It uses a two-plane architecture that separates token management from runtime enforcement. This separation means you can add runtime enforcement to existing infrastructure without replacing your authentication stack, and you can change enforcement policies without changing how tokens are issued.

AgentID two-plane architecture

Identity Broker
Identity verification: authenticate the requesting agent instance
Policy evaluation: evaluate tenant and role-based policy for the request
Token issuance: mint short-lived capability-scoped tokens
Token introspection: validate active tokens on demand
Revocation: invalidate specific tokens or all tokens for an agent
Audit: log all token issuance and revocation events with full context
Resource Gateway
Runtime token validation: check token activity and revocation status before every call
Scope enforcement: verify the requested operation is within token scope
Constraint checks: validate per-connector constraints (amount limits, field restrictions)
Rate limiting: enforce per-token and per-agent rate controls
Policy decisions: runtime policy re-evaluation using current request context
Connector integration: forward validated requests to downstream systems
This model turns authorization into continuous verification at request time, not one-time trust at token issuance. A token that was valid when issued can be revoked before the next request reaches the gateway.

The Resource Gateway enforces at request time what the Identity Broker authorized at token issuance. If an agent's scope token says it can issue a refund up to $50, the gateway checks that constraint against the actual refund amount in the request before forwarding to the payments API. The agent cannot bypass this check by sending a request directly to the API, because the API is not accessible except through the gateway.

Connector-based integrations support common enterprise systems: payments, ticketing, collaboration, source control, secrets management, cloud identity, and data access surfaces. Each connector defines the constraint schema that the gateway can evaluate for requests targeting that system.

Section 08

Capability token design

A capability token for an AI agent is not a standard OAuth access token. It carries more context than a scope string. The additional fields are what make zero trust enforcement possible at the gateway: without resource target, constraint, and delegation context, the gateway cannot make a meaningful policy decision.

Anatomy of an AgentID capability token
agent_instance_id
Unique identifier for this specific agent instance. Not a shared service ID. Links every action to one workload for audit and forensics.
delegated_principal
The human user or system whose context this agent is acting under. Supports multi-hop delegation: agent A delegated by user Alice, spawning agent B. Full lineage tracked.
tenant_id
Which tenant or organisation context the action is scoped to. Prevents cross-tenant access even when agents share infrastructure.
capability
Specific operation permitted. Example: "payments:refund" not "payments:*". One token per task, not one token per service.
resource_target
The specific resource this capability applies to. Example: customer_id "cust_8821", not all customers. Prevents lateral movement to other records.
constraints
Per-operation limits enforced at the gateway. Example: {"max_amount": 47.50, "currency": "EUR"}. Evaluated against the actual request parameters before forwarding.
policy_id
Which policy rule authorized this token. Links the token to the policy version at issuance time. Enables audit of which policy permitted which action.
exp (expiry)
Token lifetime in seconds. Typically 300 to 900 seconds (5 to 15 minutes). After expiry, gateway rejects the token even if it passes all other checks.

Recommended adoption path for AgentID. Start with capability token issuance and introspection. Add runtime enforcement through the gateway. Enable federated identity for delegated enterprise principals. Adopt spawn-time agent lifecycle and delegation lineage controls. Expand connector constraints for high-risk workflows. This phased path delivers immediate security improvement while reducing deployment risk at each step.

Section 09

SPIFFE, SPIRE, and AgentID

A common question when implementing agent zero trust: do we need AgentID if we already use SPIFFE/SPIRE? The answer is that they solve different problems. Both are necessary in a complete zero trust implementation for AI agents.

SPIFFE / SPIRE
Identity attestation layer
Answers: which workload is calling? (verifiable identity document)
Provides cryptographic proof of service identity via X.509 SVIDs
Works at the infrastructure level: services, pods, VMs
Does NOT answer: which capability is the agent allowed to use?
Does NOT constrain which customer's data can be accessed
Does NOT track delegation across agent parent-child hierarchies
Does NOT enforce per-operation constraints at a gateway
AgentID
Capability and policy enforcement layer
Sits above the attestation layer. Can consume SPIFFE SVIDs as identity inputs.
Adds capability scoping: which operation, on which resource, with which constraints
Adds delegation context: agent instance, delegated principal, tenant, policy
Runtime gateway enforcement at every request, not just identity verification
Revocation at the capability level, not just the identity level
Together: SPIFFE proves "who is calling", AgentID proves "what they are allowed to do"

Cloud workload identity and AgentID. Cloud providers (AWS, GCP, Azure) offer workload identity federation that solves "which service is calling?" But none of them answer "which agent instance, acting under which delegated user, with which scoped capability, under which tenant policy?" Cloud workload identity is an input to AgentID, not a replacement for it. AgentID uses it as an attestation source, then layers capability scoping, policy evaluation, delegation tracking, and gateway enforcement on top.

Section 10

Worked example: the refund agent

From the Mirror Security blog on zero trust for AI agents: consider a customer support agent that needs to issue a refund. This is a common agentic workflow that illustrates exactly how the shared credential model fails and how zero trust capability tokens fix it.

Without AgentID: the agent authenticates with a shared service account that has broad access to the payments API: read transactions, issue refunds, modify subscriptions, update billing details. A compromised runtime can reuse that credential for any of those operations, indefinitely.

With AgentID: the flow looks different.

1
Agent requests a capability token from the Broker
The agent identifies itself (SPIFFE SVID or equivalent) and requests the specific capability it needs for this task.
capability: "payments:refund", resource: "customer_id:cust_7821", constraint: "max_amount:47.50"
Token request
2
Broker verifies identity and evaluates tenant policy
Policy check: "Support agents may issue refunds but may not modify subscriptions." Agent identity verified. Requested capability is within policy.
policy_result: PERMIT, token_ttl: 300s, constraint_sealed: true
Policy evaluation
3
Broker mints a short-lived capability token
Token expires in 5 minutes. Scoped to payments:refund for one customer only. Amount constraint sealed into the token so the gateway can enforce it.
token: eyJ..., exp: now+300s, scope: "payments:refund:cust_7821"
Token issued
4
Agent presents token to the Resource Gateway
Gateway validates token: active, not revoked. Checks scope matches requested operation. Checks refund amount against the max_amount constraint in the token.
gateway_check: token_valid=true, scope_match=true, amount=47.50 <= max_amount=47.50
Gateway enforces
5
Gateway forwards validated request to payments API
The payment is processed. The refund is issued. Token expires immediately after or after 5 minutes, whichever comes first.
Action executed
6
Audit trail created with full lineage
Audit record links the refund to: agent instance ID, delegated principal (customer support user), tenant policy ID, token ID, amount, customer, timestamp. Full forensic chain available.
audit: {agent_id, principal, policy_id, capability, resource, amount, timestamp}
Audited

If the agent runtime is compromised during step 4, the attacker holds a token that can only issue one bounded refund for one customer, expiring in minutes. Compare that with the shared credential that can modify any billing record, for any customer, indefinitely. The blast radius difference is the practical value of zero trust for agents.

Section 11

The four AI trust planes

A complete zero trust architecture for AI covers four distinct planes. Each addresses a different category of risk. No single product covers all four. Mirror Security's platform is designed so that each product addresses one plane, and the four planes together cover the full attack surface of a production AI deployment.

🔒
Data plane
Encrypted inference and vector search
The data that flows through the AI pipeline must never be exposed in plaintext on the server side. Embeddings are encrypted before storage. Similarity search runs on encrypted vectors. Retrieved documents are decrypted only at the authorised client. Closes the inference gap at the retrieval layer.
VectaX · Similarity-preserving FHE for vector databases
🎯
Model plane
Continuous adversarial validation
The model itself must be continuously tested for known attack patterns. A model safe at deployment may not be safe after fine-tuning, after new prompt injection techniques are discovered, or after the threat landscape shifts. Continuous red teaming validates model behaviour against the current attack surface.
DiscoveR · Automated AI red teaming and attack surface testing
Runtime plane
Inline guardrails on every output
Every model output must be checked before it reaches the user or triggers downstream actions. PII in outputs, hallucinated facts, toxic content, and prompt injection attempts are all runtime events that continuous verification must catch. Guardrails run inline: they do not block requests but they flag, filter, or block based on policy.
AgentIQ · Runtime guardrails for PII, hallucination, injection, toxicity
👤
Identity plane
Capability-scoped agent access control
Every agent action must be authorized by a capability-scoped token with delegation lineage and runtime enforcement. Short-lived tokens, workload-specific identities, constraint validation at the gateway, and fast revocation are all required to prevent one compromised agent from becoming a broad breach.
AgentID · Zero trust identity and access for autonomous agents

From the Mirror Security blog: "Most organizations today address these concerns piecemeal: a guardrail vendor here, an encryption tool there, identity bolted on as an afterthought. The gaps between those point solutions are where breaches happen. Mirror's platform closes those gaps by design." A production AI system that closes only two of the four planes has an incomplete zero trust posture. Adversaries find and exploit the open planes.

Section 12

DiscoveR on the model plane

Zero trust applied to models means you never assume a model is safe because it passed evaluation at deployment. The threat landscape changes. New attack techniques for prompt injection, jailbreaks, and data extraction appear continuously. A model that was secure last month may be vulnerable today because an attacker has published a new technique that your static evaluation set did not cover.

DiscoveR continuously runs adversarial tests against your deployed models and AI applications. It registers your application endpoint, selects attack categories relevant to your system type and threat model, and runs structured probes. Results include per-category pass rates, severity scores, and a correlation ID that links scans across remediation cycles so you can measure whether a fix held.

In a zero trust context, DiscoveR operationalises the "verify continuously" principle at the model plane. It is not a one-time pentest. It runs on a schedule or trigger, covers the current attack surface, and produces structured findings that feed into your security monitoring workflow.

Section 13

AgentIQ on the runtime plane

Zero trust applied to the runtime plane means every model output is checked before it acts in the world. A customer support agent that outputs a hallucinated refund amount, leaks PII from a retrieved document, or has been redirected by a prompt injection in a customer message are all runtime security failures. They cannot be caught by a static scan because they depend on what the model sees and produces at inference time.

AgentIQ provides runtime guardrails that run inline, applied to model outputs before they reach downstream systems. The guardrail categories map directly to the zero trust "verify continuously" principle. PII detection catches sensitive data that should not appear in outputs or agent actions. Hallucination scoring validates whether factual claims have support in the context. Prompt injection defence detects attempts to redirect the agent away from its intended behaviour. Chain security validation checks the integrity of multi-step agent workflows.

In a zero trust context, AgentIQ is the enforcement point for the runtime plane. It does not inspect model weights or training data. It operates on the live stream of inputs and outputs at the moment of inference, making trust decisions per response rather than per deployment.

Section 14

Frequently asked questions

What is the inference gap and why does it matter for zero trust?

The inference gap is the security exposure that occurs during AI model inference. Most deployments correctly encrypt data at rest and in transit. But at inference time, data is typically decrypted into memory so the model can process it. Once data exists as plaintext in RAM, it is reachable by privileged insiders, compromised hosts, memory scraping malware, crash dumps, misconfigured telemetry, and side-channel attacks. This gap exists even in architectures that pass all standard compliance audits. Verifiable inference through confidential computing or FHE closes this gap by ensuring plaintext never appears in the compute environment.

Why do AI agents need different identity management from traditional IAM?

AI agents are short-lived, task-scoped, may spawn child agents, and operate at machine speed. Mapping them to shared service accounts designed for static services loses two core security properties: precise identity at the workload instance level, and constrained capability at the operation level. One compromised shared credential creates a disproportionate blast radius when identity and access boundaries are not workload-specific. Zero trust for agents requires short-lived capability-scoped tokens, delegation lineage tracking, and runtime enforcement at every request.

What is AgentID and how does it work?

AgentID is Mirror Security's zero trust identity and access management layer for autonomous agents. It uses two planes: an Identity Broker that handles token issuance, policy evaluation, revocation, and audit; and a Resource Gateway that validates tokens, enforces capability scope, checks constraints, and applies rate limits at every request. Tokens are short-lived (minutes), capability-scoped (specific operation on specific resource with constraints), and carry delegation context (agent instance, delegated principal, tenant, policy). This turns authorization into continuous enforcement at request time rather than one-time trust at token issuance.

How do the four Mirror Security products map to zero trust planes?

VectaX addresses the data plane: similarity-preserving FHE keeps vector embeddings encrypted throughout storage and retrieval, closing the inference gap at the retrieval layer. DiscoveR addresses the model plane: continuous adversarial testing validates that model behaviour stays within expected boundaries as the threat landscape evolves. AgentIQ addresses the runtime plane: inline guardrails check every model output for PII, hallucination, prompt injection, and toxicity. AgentID addresses the identity plane: capability-scoped tokens, delegation lineage, and gateway enforcement control what each agent is allowed to do at every request.

Does geographic data residency provide sovereignty for AI inference?

No. Geographic residency requirements (keeping data within national borders) address jurisdiction and procurement compliance. They do not change what happens when inference requires plaintext in memory. A hospital that keeps patient data within EU borders but runs standard inference still exposes patient data as plaintext during computation. Real sovereignty requires technical guarantees at compute time: either confidential computing with remote attestation, or FHE which eliminates the need to trust the infrastructure operator at all. Residency is necessary but not sufficient for sovereign AI inference.

We already use SPIFFE/SPIRE. Do we still need AgentID?

SPIFFE and SPIRE solve a different problem. They answer "which workload is calling?" with a cryptographically verifiable identity document. They do not answer "which specific capability is the agent allowed to use, on which specific resource, with which constraints, under which delegated principal and tenant policy?" AgentID sits above the attestation layer. It can consume SPIFFE SVIDs as identity inputs, then layer capability scoping, policy evaluation, delegation context, and gateway enforcement on top. You need both: SPIFFE proves identity, AgentID proves authorized capability.

Next: Module E2 of 5

Security Monitoring and Anomaly Detection

What to monitor in an AI stack, anomaly detection for prompt injection and model drift, logging strategies that capture the right signals without capturing sensitive training data, and how to build observability into AI systems from the start.