What forensics artifacts exist in an AI incident?

AgentID audit logs record every token issuance event, delegation chain, gateway enforcement decision, and revocation event. VectaX audit logs record every retrieval event with timestamps, namespace, and document ID without exposing encrypted content. AgentIQ event logs record per-output classification signals: PII detected, injection detected, hallucination score. DiscoveR scan results provide pre- and post-incident model behaviour baselines. Query hash logs from the monitoring layer identify repeated or clustered queries without storing query content. Model weight checksums and version hashes confirm whether the deployed model matches the expected artifact.

What does the DiscoveR remediation cycle look like after an AI incident?

After resolving a prompt injection or model compromise incident, run a full DiscoveR scan using the same correlation_id as the baseline scan. Compare pass rates per attack category against the pre-incident baseline. If the attack category that enabled the incident now shows a lower pass rate than before the incident, the fix introduced regression rather than closing the gap. If the category that was exploited now shows a higher pass rate than the baseline, the fix has worked. Do not close the incident until the DiscoveR post-fix scan confirms the attack path is closed and no new regressions have been introduced by the fix.

AI Incident Response: Playbooks, Forensics and Containment | Track 3E

Q: How does AI incident response differ from traditional incident response?

Traditional incident response deals with systems that have clear compromised or not-compromised states. AI incidents are different in three ways. First, the blast radius can be invisible: an agent breach may have touched hundreds of API endpoints and customer records without any file system or network anomaly. Second, the model itself can be the compromised artifact: a backdoored model looks identical to a clean model at the file level. Third, remediation requires verification against the current attack surface, not just patching a CVE. After resolving an AI incident, you need to run a new adversarial scan against the affected system to confirm the attack path is closed, because the fix itself may have introduced new vulnerabilities through the model update process.

Q: What are the four most common AI security incidents?

Prompt injection attack: attacker embeds instructions in user input or retrieved content to redirect model behaviour or exfiltrate context window contents. Model compromise: a backdoor, adversarial fine-tuning, or supply chain compromise has changed the model's behaviour in ways not caught before deployment. Distillation campaign: systematic extraction of the model's reasoning capabilities through high-volume API queries across many accounts. Agent breach: an AI agent has been redirected by injection or credential compromise to take actions outside its authorized scope, potentially touching many downstream systems.

Q: How do you contain an agent breach?

Immediate containment: revoke all active AgentID tokens for the affected agent instance through the Identity Broker. This takes effect at the Resource Gateway within seconds for all pending requests. Do not wait to investigate before revoking. Run the AgentID audit log query to enumerate every tool call made in the affected session, every resource touched, and every delegation chain that originated from the compromised agent. Assess blast radius: list all distinct API endpoints called, customer records accessed, and files modified. For each downstream system, determine whether the agent had capability-scoped or broad access. Then escalate to the owners of affected downstream systems with the specific resource list from the audit log.

Section 01

Why AI IR differs

Traditional incident response works well when the compromised artifact is a file, a credential, or a network connection. You identify the file, patch the vulnerability, reset the credential, and block the connection. The compromised state is usually visible in the file system or the access log.

AI incidents break these assumptions in three specific ways that require a different response approach.

The blast radius is invisible without querying the right logs. An agent breach may have touched two hundred customer records, called six downstream APIs, and modified three data stores, all through capability-scoped tokens that looked normal at the time. There is no file system anomaly. There is no privilege escalation event. The entire damage is in the AgentID audit log, and it requires a specific query to enumerate. If you do not have that log, you cannot scope the incident.

The model itself can be the compromised artifact. A backdoored model and a clean model are identical at the file level if the attacker has control over the model artifact. A model weight file does not have an obvious malicious indicator the way a malware binary does. Detecting model compromise requires behavioural testing, specifically running adversarial probes and comparing the results against a known-good baseline.

The fix can introduce new vulnerabilities. After a prompt injection incident, the typical remediation involves updating the system prompt or adding new filtering rules. That change goes through the model. The updated model may now be more resistant to the specific injection that triggered the incident but may have regressed on a related attack category. Remediating an AI incident without running a full adversarial scan after the fix is operationally equivalent to patching a CVE and not testing whether the patch worked.

Do not close an AI incident until you have run a post-fix adversarial scan. A fix that removes the specific attack path that was exploited may simultaneously open a related one. DiscoveR's correlation_id feature links the post-fix scan to the pre-incident baseline, so you can see exactly which categories improved and which regressed as a result of the change.

Section 02

The four incident types

Most AI security incidents fall into four categories. They require different initial containment actions, different forensics approaches, and different remediation paths. Identifying the type early determines which playbook to follow and which team members to page immediately.

☣

Prompt injection attack

Critical

Attacker embeds instructions in user input or retrieved content to redirect the model's behaviour, override system prompt constraints, exfiltrate context window contents, or trigger unauthorized agent actions.

Detection signals: AgentIQ injection detection rate spike, agent blast radius anomaly, out-of-scope tool calls in AgentID logs

📈

Model compromise

Critical

A backdoor, adversarial fine-tuning, or supply chain compromise has changed the model's behaviour. The deployed model no longer matches the expected safety and capability profile. May have been introduced through a model update pipeline.

Detection signals: DiscoveR refusal rate regression after update, jailbreak success on previously-failing techniques, capability anomaly on held-out eval

📱

Distillation campaign

High

Systematic extraction of the model's reasoning capabilities through high-volume coordinated querying across many accounts. Goal is to build a student model trained on harvested (prompt, response) pairs. Documented at scale against frontier models in 2026.

Detection signals: E2 population-level query clustering, inter-account semantic similarity above threshold, query rate anomaly normalised to account age

🤖

Agent breach

Critical

An AI agent has been redirected by prompt injection, token theft, or delegation chain manipulation to take actions outside its authorized scope. May have touched many downstream systems through valid but misused capability tokens.

Detection signals: AgentID delegation depth exceeded alert, cross-tenant access attempt, blast radius anomaly, failed authorization rate spike

Section 03

IR lifecycle for AI

The standard incident response lifecycle (detect, contain, investigate, remediate, recover) applies to AI incidents, but the content of each phase is different. One phase has no traditional equivalent: verify. In AI IR, the verify phase is not optional. It is the step where you run a post-fix adversarial scan to confirm that the remediation actually closed the attack path.

AI incident response lifecycle

01

Detect

Alert fires from monitoring layer. Classify incident type. Page the right responder.

›

02

Contain

Stop ongoing harm. Revoke tokens, suspend endpoints, or roll back model before investigating.

›

03

Investigate

Query forensics artifacts. Scope blast radius. Identify root cause and attack path.

›

04

Remediate

Apply fix: update policy, roll back model, tighten scopes, activate hardening controls.

›

05

Verify

Run DiscoveR post-fix scan with same correlation_id. Confirm attack path closed. No new regressions.

Contain before investigate. In AI incidents, the instinct to understand the full scope before acting can delay containment while the incident is still ongoing. Revoke the implicated tokens or suspend the affected endpoint first. Investigation can happen against logs. You do not need the system running to investigate it.

Section 04

Playbook: Prompt injection attack

Prompt injection is the most common active AI incident type. The attacker embeds instructions in user input or retrieved content to redirect the model away from its intended behaviour. Direct injection targets the system prompt through the user turn. Indirect injection plants instructions in documents that the model retrieves through the RAG pipeline.

CRITICAL Prompt Injection Attack Playbook

DETECT

AgentIQ injection rate crosses alert threshold

Injection detection rate above 1% for a 15-minute window, or any output flagged as affecting agent scope. Check whether the injection succeeded or was blocked by the guardrail.

AgentIQ event log: injection_detected=true, injection_type

CONTAIN

Suspend the affected agent instance and revoke its tokens

Revoke all active AgentID tokens for the affected agent instance through the Identity Broker. If a RAG document was the source, quarantine the document from the retrieval index. Suspension takes effect within seconds at the Resource Gateway.

AgentID: revoke agent_instance_id, remove quarantined doc from vector index

INVESTIGATE

Find the first injection event and trace downstream actions

Query AgentIQ logs for the earliest timestamp where injection_detected=true in the affected session. Cross-reference with AgentID audit logs for all tool calls and resource accesses that occurred after that timestamp. This is the blast radius window.

AgentIQ: query injection events by session_id, timestamp range

INVESTIGATE

Determine injection type and source

Classify: direct injection (user turn), indirect injection (retrieved document), or chain injection (injected sub-agent spawned by the primary agent). For indirect injection, identify which document triggered it using the VectaX retrieval audit log and the AgentIQ injection_source field.

VectaX audit log: retrieval events in blast radius window, document IDs

REMEDIATE

Update AgentIQ policy rules for the injection type detected

For direct injection: update the AgentIQ prompt injection detection policy to cover the pattern used. For indirect injection: add the source document or document class to the quarantine list; review the document ingestion pipeline for the infection vector. For chain injection: add a delegation depth constraint to the relevant AgentID token policy.

AgentIQ: policy update, AgentID: delegation depth constraint

VERIFY

Run DiscoveR scan targeting injection categories with same correlation_id

Run a DiscoveR scan with security_categories including jailbreakAndInjection using the same correlation_id as the pre-incident baseline. Compare pass rates. The specific injection technique that succeeded pre-incident should now fail. If any other injection category has regressed, do not close the incident.

DiscoveR: create_discover_scan, correlation_id=baseline_scan_id

Section 05

Playbook: Model compromise

Model compromise is the hardest AI incident to detect early because the model looks identical to a clean model at the file level. By the time behavioural testing reveals the compromise, the model may have been in production for days or weeks. The goal of this playbook is rapid rollback followed by thorough supply chain forensics to identify where the compromise was introduced.

CRITICAL Model Compromise Playbook

DETECT

DiscoveR scan shows refusal rate regression after a model update

Post-update DiscoveR scan shows a category where pass rate has dropped more than 10 percentage points compared to the pre-update baseline scan. Or: a jailbreak technique that previously failed is now succeeding. Either signal requires immediate investigation.

DiscoveR: compare scan results, category pass rate delta

CONTAIN

Roll back to the previous verified clean model checkpoint

Do not attempt to patch the compromised model in place. Roll back to the most recent checkpoint that passed a clean DiscoveR scan. Quarantine the compromised model artifact for forensics. The rollback takes effect immediately with no user impact if you are running blue/green deployments.

Model deployment: activate previous checkpoint, quarantine compromised artifact

INVESTIGATE

Verify model weight checksums against the expected artifact

Compute SHA-256 of the compromised model weights and compare against the expected checksum from the model registry. A mismatch confirms supply chain compromise. A match means the compromise was introduced through the training or fine-tuning pipeline, not the artifact distribution.

Model registry: expected_checksum, sha256(compromised_model_weights)

INVESTIGATE

Run a full DiscoveR scan against the quarantined model to characterise the compromise

Run all attack categories against the quarantined model in an isolated environment. The pattern of category failures characterises the type of compromise: a backdoor shows specific trigger-sensitive failures; adversarial fine-tuning shows broad safety degradation; a capability attack shows regression in specific reasoning domains.

DiscoveR: full category scan in isolated environment, security_categories=all

INVESTIGATE

Trace the model update pipeline for the insertion point

Review the complete pipeline from training data ingestion through fine-tuning through model registry through deployment. Check access logs for the model registry for any unauthorized modifications. Review the fine-tuning dataset for the update that introduced the compromise.

Pipeline logs, model registry access log, fine-tuning dataset review

REMEDIATE

Harden the model update pipeline and verify the clean checkpoint

Add mandatory DiscoveR scan gate to the model deployment pipeline: no model update can reach production without passing a DiscoveR scan against the baseline. Add checksum verification at every pipeline stage. If supply chain compromise is confirmed, notify the model provider.

CI/CD pipeline: add DiscoveR scan gate, add checksum verification step

VERIFY

Run clean DiscoveR scan against the rolled-back checkpoint

Run the full DiscoveR scan against the restored model checkpoint before reopening production traffic. All category pass rates should be at or above the last known-good baseline. Do not reopen production traffic until this scan passes.

DiscoveR: full scan on restored checkpoint, all categories pass at baseline

Section 06

Playbook: Distillation campaign

A distillation campaign is a longer-duration incident. It may have been running for days or weeks before detection. Containment stops ongoing collection; forensics estimates how much was extracted; hardening makes future extraction hostile. The VectaX FHE stack is the primary technical hardening control, making harvested outputs toxic for training rather than simply trying to block collection.

HIGH Distillation Campaign Playbook

DETECT

Population-level monitoring alert: inter-account query similarity above threshold

E2 monitoring signals: accounts showing cosine similarity above 0.85, query rate anomaly normalised to account age, or systematic topic coverage pattern. Cross-reference with response clustering to confirm coordinated extraction pattern rather than organic similar usage.

Monitoring layer: inter_account_similarity, topic_entropy, query_rate_cohort

CONTAIN

Revoke API keys for all implicated accounts and enable enhanced rate controls

Revoke or suspend the API keys identified in the implicated account cluster. Enable temporary enhanced rate controls for the affected endpoint. For insider threats, suspend the specific API key and notify the account holder's organization through the appropriate channel.

API gateway: revoke keys for implicated_account_ids, enable rate_control_enhanced

INVESTIGATE

Estimate the coverage and duration of the extraction campaign

Query the response clustering logs to identify which regions of the model's capability space have been covered by the implicated accounts. Estimate the total number of (prompt, response) pairs extracted. Identify the start date of the campaign from the earliest account creation in the cluster.

Response clustering logs: topic_coverage_by_account_cluster, campaign_start_date

INVESTIGATE

Determine whether the campaign targeted reasoning traces specifically

Query types that maximize chain-of-thought extraction are distinct from queries that collect (prompt, answer) pairs only. Reasoning trace extraction produces a more capable student model in fewer examples. If the query patterns suggest CoT harvesting, escalate the severity assessment.

Query pattern analysis: step-by-step, explain-your-reasoning query types in implicated cluster

REMEDIATE

Activate VectaX FHE stack for affected endpoints to make future harvest toxic

Activate the VectaX FHE layer for the affected model endpoints. This injects training-hostile noise at the latent level that is invisible to legitimate users but degrades any student model trained on harvested outputs. The more a future attacker collects, the worse their student model performs. Monitoring alone cannot stop sophisticated distillers; making the harvest worthless does.

VectaX: activate FHE stack for endpoint_id, enable_latent_noise=true

VERIFY

Confirm extraction signal has ceased after account revocation

Monitor the inter-account similarity metric for 48 hours after revocation. A return to baseline confirms the implicated accounts drove the signal. If the signal persists from different accounts, the campaign has additional infrastructure. Escalate and repeat containment for the new cluster.

Monitoring: inter_account_similarity rolling window, topic_entropy after revocation

📋 Mirror Blog · The Distillation Problem Has a New Answer: Make the Harvest Worthless

Section 07

Playbook: Agent breach

An agent breach is the highest-velocity AI incident. An agent operating at machine speed can touch hundreds of customer records, call dozens of APIs, and modify data across multiple systems in seconds. The window between breach and containment determines the blast radius. Fast token revocation through AgentID is the single most important containment action.

CRITICAL Agent Breach Playbook

DETECT

AgentID alert: delegation depth exceeded or cross-tenant access attempt

Any cross-tenant access attempt is an immediate incident. Delegation depth exceeded configured maximum indicates a spawned sub-agent hierarchy not anticipated by the policy design, which is consistent with a prompt injection-driven agent hijack.

AgentID audit log: delegation_depth_exceeded=true or cross_tenant_attempt=true

CONTAIN

Revoke ALL active tokens for the compromised agent instance immediately

Use the AgentID Identity Broker to revoke all tokens issued to the affected agent_instance_id. This does not require knowing the scope of the breach. Revoke first, investigate second. Token revocation propagates to the Resource Gateway within seconds, stopping all in-progress agent actions.

AgentID: revoke_all_tokens(agent_instance_id), propagation < 10s

INVESTIGATE

Query AgentID audit log to enumerate every resource the agent touched

Run a query on the AgentID audit log for all token_id values issued to the compromised agent_instance_id, then enumerate every gateway enforcement event for each token. This gives the complete list: every API endpoint called, every customer record accessed, every file modified, and every downstream agent spawned.

AgentID: audit_log_query(agent_instance_id, time_range=breach_window)

INVESTIGATE

Assess blast radius: what was touched and by what scope

From the audit log enumeration: count distinct customer records, API endpoints, and data stores accessed. For each, determine whether the agent acted within its capability scope (legitimate but misdirected) or outside it (token was forged or the gateway was bypassed). Out-of-scope actions require immediate notification to affected system owners.

Blast radius report: distinct_resources, scope_violations, downstream_owners_to_notify

INVESTIGATE

Trace the breach origin: injection, token theft, or delegation manipulation

Cross-reference the breach start time with AgentIQ injection detection logs. If an injection event preceded the out-of-scope actions, the breach originated from prompt injection. If no injection event exists, investigate whether a token was replayed from another session (token_id_hash mismatch) or whether the delegation chain was manipulated by a malicious sub-agent spawn.

AgentIQ: injection events before breach_start_time, AgentID: delegation chain forensics

REMEDIATE

Tighten capability scopes and constraints on the affected workflow

Review the token policy for the affected agent workflow. Tighten the capability scope to the minimum required for the task. Add explicit resource_target constraints where previously absent. Reduce token lifetime to the minimum required for the task. Add a delegation depth maximum constraint.

AgentID: update token policy, tighten capability, add resource_target, reduce exp

VERIFY

Run a test session and confirm the gateway blocks the previously-exploited path

Run a controlled test session that attempts the same out-of-scope action that the breached agent took. The AgentID gateway should block it with a scope violation. If the gateway allows the action, the policy update did not correctly tighten the scope. Do not resume production traffic for this workflow until the test passes.

AgentID: test session, expect gateway_rejection on previously-exploited scope

📋 Mirror Blog · Zero Trust for AI Agents: Solving Identity and Access with AgentID

Section 08

Forensics artifacts

AI forensics uses different artifacts from traditional forensics. There is no memory dump of a compromised process. There is no malware binary. There is no file system modification timestamp. The evidence lives in audit logs from the AI-specific tools in the stack and in the model's behavioural profile captured by adversarial testing.

Mirror Security · AgentID

Identity and Access Audit Log

Every token issuance event, delegation chain record, gateway enforcement decision, scope violation, and revocation event. The primary blast radius artifact for agent breach incidents.

agent_instance_id delegated_principal capability resource_target gateway_result revocation_event

Mirror Security · VectaX

Retrieval Audit Log

Every document retrieval event with timestamp, namespace, document ID, and access policy result. Reveals which documents were retrieved in a blast radius window without exposing encrypted content.

timestamp namespace doc_id policy_result cross_namespace_flag

Mirror Security · AgentIQ

Output Classification Event Log

Per-output classification signals for every model response in a session. Primary injection forensics artifact. Shows the exact timestamp of the first injection detection and the classification for every subsequent output.

injection_detected injection_type pii_detected hallucination_score chain_security_status session_id

Mirror Security · DiscoveR

Adversarial Scan Result Set

Pre- and post-incident model behaviour baselines. Per-category pass rates with timestamps. Correlation ID chains linking scans across the remediation cycle. The only artifact that reveals model compromise.

scan_id correlation_id category_pass_rates baseline_delta timestamp

Monitoring layer

Query Hash and Cluster Log

SHA-256 hashes of normalized queries, nearest semantic cluster IDs, inter-account similarity scores. Primary distillation forensics artifact. Identifies campaign start date and coverage without storing query content.

query_hash cluster_id inter_account_similarity topic_entropy account_age

Model registry

Model Weight Checksums

SHA-256 checksums of model weight artifacts at every pipeline stage. Confirms whether the deployed model matches the expected artifact from the model registry. The only file-level artifact for model compromise investigation.

expected_checksum deployed_checksum pipeline_stage update_timestamp

Log retention for AI incidents. The forensics artifacts above are only useful if they are retained long enough to cover the campaign duration. Distillation campaigns can run for weeks before detection. AgentID and VectaX audit logs should be retained for a minimum of 90 days. DiscoveR scan results should be retained indefinitely as model behaviour baselines. Query hash and cluster logs should be retained for 90 days at minimum.

Section 09

Containment strategies

AI containment uses different tools from traditional containment. You are not blocking a network connection or quarantining a file. You are revoking tokens, rolling back model artifacts, suspending endpoints, or activating encryption controls. Each containment action has a different speed, reversibility, and blast radius impact.

Traditional containment actions

Block a network connection or IP range

Quarantine a malware-infected file or process

Revoke a compromised user credential

Isolate a compromised host from the network

Cannot roll back a model to a previous behaviour state

Cannot assess blast radius from agent audit log

Cannot make harvested data toxic for training

AI-specific containment actions

Token revocation via AgentID: stops all in-progress agent actions within seconds

Model rollback: swap the deployed model to the last verified-clean checkpoint

Endpoint suspension: temporarily disable the affected API endpoint while preserving logs

Document quarantine: remove a poisoned document from the retrieval index

VectaX FHE activation: makes future harvested outputs toxic for training at the latent level

Rate control escalation: enforce enhanced rate limits on affected endpoints

API key cluster revocation: revoke all keys in the implicated account cluster

Token revocation is the fastest containment action in an AI stack. AgentID revocation propagates to the Resource Gateway in seconds. This is why short-lived capability-scoped tokens from E1 matter for incident response: a token that expires in 5 minutes has a maximum blast radius window of 5 minutes even with no active revocation. A shared credential that lasts months requires an active revocation that takes minutes to propagate and risks breaking other workflows that share the credential.

Section 10

Response timeline

The following timeline applies to a high-severity AI incident (critical classification). The T+0 to T+1 hour window is where containment decisions are made. After T+1 hour, the focus shifts to investigation and communication. The verify phase should close before T+72 hours for most incidents.

T+0

DETECT

Alert fires. Page the AI security responder on call. Identify incident type from alert source (AgentIQ, AgentID, DiscoveR, monitoring layer). Open the incident ticket and set severity.

On-call

T+5m

CONTAIN

Execute immediate containment. For agent breach: revoke tokens. For injection: suspend agent instance. For model compromise: initiate rollback to last clean checkpoint. Do not wait for investigation before containment.

On-call + Platform eng

T+15m

INVESTIGATE

Query forensics artifacts. AgentID audit log for blast radius. AgentIQ event log for first injection timestamp. VectaX audit log for retrieval events in the window. Initial blast radius estimate completed.

On-call + Security

T+30m

INVESTIGATE

Notify downstream system owners whose systems appear in the blast radius. Provide the specific resource list from the AgentID audit log. Do not estimate: use the exact list from the audit log.

Security + Legal

T+1h

INVESTIGATE

Root cause identified. Attack path mapped. Briefing delivered to security leadership. Regulatory notification assessment completed (is a GDPR 72-hour window triggered?).

Security lead + Legal

T+4h

REMEDIATE

Fix deployed. Policy update, model rollback, or VectaX activation in place. Affected systems restored to operation where safe to do so. Evidence preserved for forensics before any systems are cleaned.

Platform eng + Security

T+24h

VERIFY

DiscoveR post-fix scan completed. Per-category comparison against baseline confirms attack path is closed. No new regressions introduced by the fix. Incident can be moved to closed status if scan passes.

Security

T+72h

VERIFY

Post-incident review completed. PIR document captures root cause, timeline, blast radius, gaps in monitoring or policy, and hardening actions. Regulatory notification filed if required.

Security lead

Section 11

The remediation cycle

The remediation cycle is the sequence of steps that closes an AI incident with confidence. It uses DiscoveR as the verification tool for each fix. The cycle is not complete until the post-fix scan confirms the attack path is closed and no new regressions exist.

1

Run a baseline DiscoveR scan before any model or policy change

Before deploying any model to production, run a full DiscoveR scan and store the results. This is the baseline that all future comparisons use. If you do not have a pre-incident baseline, run a scan against the last known-good checkpoint and use that as the reference.

DiscoveR baseline scan, all categories

2

Incident detected: identify which categories are failing and at what rate

When an incident is detected and investigated, run a DiscoveR scan against the affected system. The per-category results identify which attack categories are now succeeding that should not be. This scopes the remediation to the specific categories that need fixing.

DiscoveR incident scan, same correlation_id

3

Deploy the fix targeting the specific failing categories

A focused fix targets the failing categories without touching unrelated parts of the system. A broad fix (like a full model rollback) may change pass rates across all categories. Either way, the post-fix scan will show the full picture of what changed.

Policy update, model rollback, or guardrail change

4

Run post-fix DiscoveR scan with the same correlation_id

Run a new DiscoveR scan with the same correlation_id as the baseline and incident scans. The scan results now show: which categories improved (the fix worked for those), which stayed the same (unaffected), and which regressed (the fix introduced new issues in those categories).

DiscoveR post-fix scan, correlation_id=original

5

Close only if: previously failing categories now pass AND no new regressions

The incident is closed when: the categories that were failing at the time of the incident are now passing at or above baseline, AND no other category has regressed below baseline as a result of the fix. If any regression exists, return to step 3 with a more targeted fix.

Incident closure criteria: all categories at or above baseline

Section 12

Communication template

AI incidents require communication to multiple audiences: security leadership, legal, downstream system owners, and potentially regulators. Each audience needs different information. The template below covers the fields that each communication should contain for a critical AI incident.

Initial notification template: T+30 minutes (internal)

Incident type

One of: Prompt injection / Model compromise / Distillation campaign / Agent breach

Severity

Critical / High / Medium. Specify: is data at risk, is availability affected, is a regulatory notification window triggered?

Detection time

Exact timestamp of first alert. Estimated start time of incident if earlier than detection (relevant for distillation campaigns).

Containment status

What containment action was taken, at what time, and what systems are currently suspended or operating in reduced capacity.

Blast radius

Number of distinct customer records accessed, API endpoints called, data stores modified. Source: AgentID audit log. Do not estimate; use the exact count from the log.

Data exposure

Confirm whether PII, PHI, financial data, or regulated data classes appear in the blast radius. Source: VectaX audit log for retrieval events + AgentIQ PII detection flags in the blast radius window.

Regulatory flag

State whether GDPR Article 33 (72-hour notification window), HIPAA breach notification, or other regulatory obligation is potentially triggered. Legal must assess.

Next update time

Time of the next scheduled update (typically T+1h for critical incidents). State what will be known by then: root cause, full blast radius, remediation plan.

GDPR Article 33 applies to AI incidents that expose personal data. If the blast radius includes personal data of EU individuals, GDPR requires notification to the supervisory authority within 72 hours of becoming aware of the breach. The 72-hour clock starts at T+0 when the alert fires. The VectaX audit log and AgentIQ PII detection flags are the primary evidence that personal data was or was not in the blast radius. Preserve these logs before any remediation steps that might affect log integrity.

Section 13

Post-incident review

The post-incident review (PIR) for an AI security incident covers the same areas as a traditional PIR but adds AI-specific gap analysis. The questions below identify the specific failure points that AI incidents commonly expose.

✓

Was the incident detectable before it was detected?

Review the monitoring logs from before the incident alert fired. Were there earlier signals that crossed a threshold but did not trigger an alert? If yes: the alert threshold was too high. Was the relevant layer being monitored at all? If not: add that monitoring layer before the next incident.

✓

Was a pre-incident DiscoveR baseline available?

If no pre-incident model behaviour baseline existed, the incident response team could not confirm whether the incident caused model drift or whether the model was already drifted before the incident. If no baseline existed: add a mandatory baseline scan to the deployment pipeline before the next incident occurs.

✓

Was the blast radius bounded by short-lived capability-scoped tokens?

If the agent involved used a shared long-lived service account, the blast radius was theoretically unbounded. If it used AgentID capability-scoped tokens with short lifetimes, the blast radius was bounded by the token's scope and expiry. If shared credentials were in use: migrate the affected workflow to AgentID capability tokens before the next incident.

✓

Did VectaX encryption limit the impact of a retrieval layer breach?

If a RAG document was the source of an indirect injection or data exfiltration, and VectaX encryption was not active on that retrieval layer, the attacker had access to plaintext document content. If VectaX was not active: assess whether the affected endpoint should be protected before returning to production.

✓

Was the DiscoveR post-fix scan required before incident closure?

If the incident was closed before a post-fix DiscoveR scan was run, the fix may have introduced new vulnerabilities that were not detected. If the scan was skipped: add it as a mandatory closure gate before the next incident. Closing without verification is the single most common post-incident gap in AI security operations.

✓

Were the required forensics artifacts available and within retention window?

Review which artifacts were needed but missing or out of retention window: AgentID audit log, VectaX retrieval log, AgentIQ event stream, DiscoveR scan history. For each missing artifact: update the retention policy and confirm the logging configuration is correct before the next incident.

📋 Mirror Blog · Sovereignty Without Verifiable Inference Is a Mirage

Section 14

Frequently asked questions

How does AI incident response differ from traditional incident response?

Traditional IR deals with systems that have clear compromised or not-compromised states visible in the file system or access log. AI incidents differ in three ways. The blast radius can be invisible without querying the agent audit log: an agent breach may have touched hundreds of records with no file system anomaly. The model itself can be the compromised artifact: a backdoored model looks identical to a clean model at the file level. And remediation requires adversarial verification: a fix that closes the specific attack path may simultaneously open a related one, which only a post-fix adversarial scan can detect.

What are the four most common AI security incidents?

Prompt injection attack: attacker embeds instructions to redirect model behaviour or exfiltrate context window contents. Model compromise: a backdoor or adversarial fine-tuning has changed the model's behaviour, introduced through a supply chain compromise or update pipeline. Distillation campaign: systematic extraction of the model's reasoning capabilities through high-volume coordinated API queries across many accounts. Agent breach: an AI agent has been redirected by injection or token manipulation to take actions outside its authorized scope, potentially touching many downstream systems through valid capability tokens.

What forensics artifacts exist in an AI security incident?

AgentID audit logs record every token issuance, delegation chain, gateway enforcement decision, and revocation event: the primary blast radius artifact. VectaX audit logs record every retrieval event with timestamps, namespace, and document ID without exposing encrypted content. AgentIQ event logs record per-output classification signals: injection detected with type, PII detected, hallucination score, chain security status. DiscoveR scan results provide pre- and post-incident model behaviour baselines. Query hash and cluster logs identify distillation campaign patterns without storing query content. Model weight checksums from the model registry confirm whether the deployed artifact matches the expected one.

How do you contain an agent breach?

Revoke all active AgentID tokens for the affected agent instance through the Identity Broker immediately. Do not wait to investigate before revoking. Token revocation propagates to the Resource Gateway within seconds, stopping all in-progress agent actions. Then query the AgentID audit log to enumerate every tool call, resource accessed, and delegation chain from the compromised agent instance. Assess blast radius from the audit log enumeration. Notify owners of affected downstream systems with the specific resource list from the audit log. Then investigate the breach origin from the forensics artifacts before remediation.

What does the DiscoveR remediation cycle look like?

Run a baseline DiscoveR scan before any model or policy change and store the per-category pass rates. When an incident occurs, run an incident scan with the same correlation_id to see which categories are failing. Deploy the fix targeting the failing categories. Run a post-fix scan with the same correlation_id. The scan now shows which categories improved, which stayed the same, and which regressed as a result of the fix. Close the incident only when: the categories that were failing at incident time are now passing at or above baseline, AND no other category has regressed. If any regression exists, return to remediation with a more targeted fix.

AI Incident Response

Why AI IR differs

The four incident types

IR lifecycle for AI

Playbook: Prompt injection attack

Verify the injection fix in under 5 minutes

Playbook: Model compromise

Playbook: Distillation campaign

Playbook: Agent breach

Forensics artifacts

Containment strategies

Response timeline

The remediation cycle

AgentIQ catches the incident. DiscoveR closes it.

Communication template

Post-incident review

Frequently asked questions

AgentIQ detects. AgentID contains. DiscoveR verifies.