E3: AI Incident Response - Playbooks, Forensics and ContainmentAI incident response differs from traditional IR because blast radius can be invisible (agent session may have touched hundreds of records without file system anomaly), model is the compromised artifact (backdoored model looks identical at file level), and remediation requires adversarial verification not just patching. Four common AI incident types: prompt injection attack (attacker embeds instructions to redirect model or exfiltrate context), model compromise (backdoor or adversarial fine-tuning changes model behaviour), distillation campaign (systematic capability extraction across many API accounts), agent breach (agent redirected to act outside authorized scope). Prompt injection playbook: detect via AgentIQ injection detection rate spike, contain by suspending affected agent instances and revoking active tokens via AgentID, investigate using AgentIQ event logs to find first injection detection and trace downstream actions, remediate by updating AgentIQ policy rules and running DiscoveR scan to confirm the attack path is closed, verify by comparing DiscoveR scan results against baseline. Model compromise playbook: detect via DiscoveR refusal rate regression or jailbreak success, contain by rolling back to previous model checkpoint, investigate by comparing model weight checksums and running full DiscoveR scan with correlation_id, remediate by verifying supply chain integrity, verify by running DiscoveR post-fix scan confirming no regression. Distillation campaign playbook: detect via population-level query clustering in E2 monitoring signals, contain by revoking implicated API keys and enabling rate controls, investigate by quantifying extracted coverage using response clustering, remediate by activating VectaX FHE stack for affected endpoints, verify by checking that extraction signal has ceased. Agent breach playbook: detect via AgentID audit log showing out-of-scope tool calls or delegation depth exceeded, contain by revoking all active tokens for the compromised agent instance (seconds via Identity Broker), investigate blast radius using AgentID audit log to enumerate every resource touched, remediate by tightening capability scopes and constraints on affected workflow, verify by running a test session and confirming gateway blocks the previously-exploited path. Forensics artifacts: AgentID audit logs (token issuance, delegation chain, gateway enforcement, revocation events), VectaX audit logs (retrieval events, namespace, document ID, timestamps without encrypted content), AgentIQ event logs (PII detected, injection detected with type, hallucination score per output), DiscoveR scan results (pre- and post-incident baselines, per-category pass rates, correlation_id chains), query hash logs (repeated identical queries without query content), model weight checksums and version hashes. Containment strategies for AI: model rollback differs from patch application because the model is the artifact, token revocation through AgentID takes seconds not hours, blast radius assessment requires querying the agent audit log not the file system, VectaX encryption limits blast radius of a data plane breach because ciphertexts are unreadable without keys. Remediation cycle: run DiscoveR baseline before incident, run post-fix scan with same correlation_id, compare per-category pass rates, do not close incident until scan confirms attack path is closed and no new regressions exist. Communication template for AI incidents: what to say to legal, security leadership, affected downstream system owners, and regulators. Post-incident review checklist for AI-specific gaps.PT42MIntermediatetrueen2026-04-07Mirror Academy
Module E3 of 5 · Track 3E: Security Operations for AI
Detect. Contain. Investigate. Verify the fix actually worked.
AI Incident Response
When something goes wrong in an AI system, the playbook looks different from a traditional breach. The compromised artifact may be a model, not a file. The blast radius may be invisible without querying the agent audit log. And patching is not enough: the fix itself can introduce new vulnerabilities. This module covers the four most common AI incidents, step-by-step playbooks, and the remediation cycle that uses DiscoveR to confirm the attack path is closed.
Traditional incident response works well when the compromised artifact is a file, a credential, or a network connection. You identify the file, patch the vulnerability, reset the credential, and block the connection. The compromised state is usually visible in the file system or the access log.
AI incidents break these assumptions in three specific ways that require a different response approach.
The blast radius is invisible without querying the right logs. An agent breach may have touched two hundred customer records, called six downstream APIs, and modified three data stores, all through capability-scoped tokens that looked normal at the time. There is no file system anomaly. There is no privilege escalation event. The entire damage is in the AgentID audit log, and it requires a specific query to enumerate. If you do not have that log, you cannot scope the incident.
The model itself can be the compromised artifact. A backdoored model and a clean model are identical at the file level if the attacker has control over the model artifact. A model weight file does not have an obvious malicious indicator the way a malware binary does. Detecting model compromise requires behavioural testing, specifically running adversarial probes and comparing the results against a known-good baseline.
The fix can introduce new vulnerabilities. After a prompt injection incident, the typical remediation involves updating the system prompt or adding new filtering rules. That change goes through the model. The updated model may now be more resistant to the specific injection that triggered the incident but may have regressed on a related attack category. Remediating an AI incident without running a full adversarial scan after the fix is operationally equivalent to patching a CVE and not testing whether the patch worked.
Do not close an AI incident until you have run a post-fix adversarial scan. A fix that removes the specific attack path that was exploited may simultaneously open a related one. DiscoveR's correlation_id feature links the post-fix scan to the pre-incident baseline, so you can see exactly which categories improved and which regressed as a result of the change.
Section 02
The four incident types
Most AI security incidents fall into four categories. They require different initial containment actions, different forensics approaches, and different remediation paths. Identifying the type early determines which playbook to follow and which team members to page immediately.
☣
Prompt injection attack
Critical
Attacker embeds instructions in user input or retrieved content to redirect the model's behaviour, override system prompt constraints, exfiltrate context window contents, or trigger unauthorized agent actions.
A backdoor, adversarial fine-tuning, or supply chain compromise has changed the model's behaviour. The deployed model no longer matches the expected safety and capability profile. May have been introduced through a model update pipeline.
Detection signals: DiscoveR refusal rate regression after update, jailbreak success on previously-failing techniques, capability anomaly on held-out eval
📱
Distillation campaign
High
Systematic extraction of the model's reasoning capabilities through high-volume coordinated querying across many accounts. Goal is to build a student model trained on harvested (prompt, response) pairs. Documented at scale against frontier models in 2026.
Detection signals: E2 population-level query clustering, inter-account semantic similarity above threshold, query rate anomaly normalised to account age
🤖
Agent breach
Critical
An AI agent has been redirected by prompt injection, token theft, or delegation chain manipulation to take actions outside its authorized scope. May have touched many downstream systems through valid but misused capability tokens.
The standard incident response lifecycle (detect, contain, investigate, remediate, recover) applies to AI incidents, but the content of each phase is different. One phase has no traditional equivalent: verify. In AI IR, the verify phase is not optional. It is the step where you run a post-fix adversarial scan to confirm that the remediation actually closed the attack path.
AI incident response lifecycle
01
Detect
Alert fires from monitoring layer. Classify incident type. Page the right responder.
›
02
Contain
Stop ongoing harm. Revoke tokens, suspend endpoints, or roll back model before investigating.
›
03
Investigate
Query forensics artifacts. Scope blast radius. Identify root cause and attack path.
›
04
Remediate
Apply fix: update policy, roll back model, tighten scopes, activate hardening controls.
›
05
Verify
Run DiscoveR post-fix scan with same correlation_id. Confirm attack path closed. No new regressions.
Contain before investigate. In AI incidents, the instinct to understand the full scope before acting can delay containment while the incident is still ongoing. Revoke the implicated tokens or suspend the affected endpoint first. Investigation can happen against logs. You do not need the system running to investigate it.
Section 04
Playbook: Prompt injection attack
Prompt injection is the most common active AI incident type. The attacker embeds instructions in user input or retrieved content to redirect the model away from its intended behaviour. Direct injection targets the system prompt through the user turn. Indirect injection plants instructions in documents that the model retrieves through the RAG pipeline.
CRITICALPrompt Injection Attack Playbook
DETECT
AgentIQ injection rate crosses alert threshold
Injection detection rate above 1% for a 15-minute window, or any output flagged as affecting agent scope. Check whether the injection succeeded or was blocked by the guardrail.
Suspend the affected agent instance and revoke its tokens
Revoke all active AgentID tokens for the affected agent instance through the Identity Broker. If a RAG document was the source, quarantine the document from the retrieval index. Suspension takes effect within seconds at the Resource Gateway.
AgentID: revoke agent_instance_id, remove quarantined doc from vector index
INVESTIGATE
Find the first injection event and trace downstream actions
Query AgentIQ logs for the earliest timestamp where injection_detected=true in the affected session. Cross-reference with AgentID audit logs for all tool calls and resource accesses that occurred after that timestamp. This is the blast radius window.
AgentIQ: query injection events by session_id, timestamp range
INVESTIGATE
Determine injection type and source
Classify: direct injection (user turn), indirect injection (retrieved document), or chain injection (injected sub-agent spawned by the primary agent). For indirect injection, identify which document triggered it using the VectaX retrieval audit log and the AgentIQ injection_source field.
Update AgentIQ policy rules for the injection type detected
For direct injection: update the AgentIQ prompt injection detection policy to cover the pattern used. For indirect injection: add the source document or document class to the quarantine list; review the document ingestion pipeline for the infection vector. For chain injection: add a delegation depth constraint to the relevant AgentID token policy.
Run DiscoveR scan targeting injection categories with same correlation_id
Run a DiscoveR scan with security_categories including jailbreakAndInjection using the same correlation_id as the pre-incident baseline. Compare pass rates. The specific injection technique that succeeded pre-incident should now fail. If any other injection category has regressed, do not close the incident.
Run a targeted DiscoveR scan with jailbreakAndInjection category and your baseline correlation_id. The per-category comparison shows whether the specific attack path is closed and whether the fix introduced any new regressions elsewhere.
Model compromise is the hardest AI incident to detect early because the model looks identical to a clean model at the file level. By the time behavioural testing reveals the compromise, the model may have been in production for days or weeks. The goal of this playbook is rapid rollback followed by thorough supply chain forensics to identify where the compromise was introduced.
CRITICALModel Compromise Playbook
DETECT
DiscoveR scan shows refusal rate regression after a model update
Post-update DiscoveR scan shows a category where pass rate has dropped more than 10 percentage points compared to the pre-update baseline scan. Or: a jailbreak technique that previously failed is now succeeding. Either signal requires immediate investigation.
Roll back to the previous verified clean model checkpoint
Do not attempt to patch the compromised model in place. Roll back to the most recent checkpoint that passed a clean DiscoveR scan. Quarantine the compromised model artifact for forensics. The rollback takes effect immediately with no user impact if you are running blue/green deployments.
Model deployment: activate previous checkpoint, quarantine compromised artifact
INVESTIGATE
Verify model weight checksums against the expected artifact
Compute SHA-256 of the compromised model weights and compare against the expected checksum from the model registry. A mismatch confirms supply chain compromise. A match means the compromise was introduced through the training or fine-tuning pipeline, not the artifact distribution.
Model registry: expected_checksum, sha256(compromised_model_weights)
INVESTIGATE
Run a full DiscoveR scan against the quarantined model to characterise the compromise
Run all attack categories against the quarantined model in an isolated environment. The pattern of category failures characterises the type of compromise: a backdoor shows specific trigger-sensitive failures; adversarial fine-tuning shows broad safety degradation; a capability attack shows regression in specific reasoning domains.
DiscoveR: full category scan in isolated environment, security_categories=all
INVESTIGATE
Trace the model update pipeline for the insertion point
Review the complete pipeline from training data ingestion through fine-tuning through model registry through deployment. Check access logs for the model registry for any unauthorized modifications. Review the fine-tuning dataset for the update that introduced the compromise.
Pipeline logs, model registry access log, fine-tuning dataset review
REMEDIATE
Harden the model update pipeline and verify the clean checkpoint
Add mandatory DiscoveR scan gate to the model deployment pipeline: no model update can reach production without passing a DiscoveR scan against the baseline. Add checksum verification at every pipeline stage. If supply chain compromise is confirmed, notify the model provider.
Run clean DiscoveR scan against the rolled-back checkpoint
Run the full DiscoveR scan against the restored model checkpoint before reopening production traffic. All category pass rates should be at or above the last known-good baseline. Do not reopen production traffic until this scan passes.
DiscoveR: full scan on restored checkpoint, all categories pass at baseline
Section 06
Playbook: Distillation campaign
A distillation campaign is a longer-duration incident. It may have been running for days or weeks before detection. Containment stops ongoing collection; forensics estimates how much was extracted; hardening makes future extraction hostile. The VectaX FHE stack is the primary technical hardening control, making harvested outputs toxic for training rather than simply trying to block collection.
E2 monitoring signals: accounts showing cosine similarity above 0.85, query rate anomaly normalised to account age, or systematic topic coverage pattern. Cross-reference with response clustering to confirm coordinated extraction pattern rather than organic similar usage.
Revoke API keys for all implicated accounts and enable enhanced rate controls
Revoke or suspend the API keys identified in the implicated account cluster. Enable temporary enhanced rate controls for the affected endpoint. For insider threats, suspend the specific API key and notify the account holder's organization through the appropriate channel.
API gateway: revoke keys for implicated_account_ids, enable rate_control_enhanced
INVESTIGATE
Estimate the coverage and duration of the extraction campaign
Query the response clustering logs to identify which regions of the model's capability space have been covered by the implicated accounts. Estimate the total number of (prompt, response) pairs extracted. Identify the start date of the campaign from the earliest account creation in the cluster.
Determine whether the campaign targeted reasoning traces specifically
Query types that maximize chain-of-thought extraction are distinct from queries that collect (prompt, answer) pairs only. Reasoning trace extraction produces a more capable student model in fewer examples. If the query patterns suggest CoT harvesting, escalate the severity assessment.
Query pattern analysis: step-by-step, explain-your-reasoning query types in implicated cluster
REMEDIATE
Activate VectaX FHE stack for affected endpoints to make future harvest toxic
Activate the VectaX FHE layer for the affected model endpoints. This injects training-hostile noise at the latent level that is invisible to legitimate users but degrades any student model trained on harvested outputs. The more a future attacker collects, the worse their student model performs. Monitoring alone cannot stop sophisticated distillers; making the harvest worthless does.
VectaX: activate FHE stack for endpoint_id, enable_latent_noise=true
VERIFY
Confirm extraction signal has ceased after account revocation
Monitor the inter-account similarity metric for 48 hours after revocation. A return to baseline confirms the implicated accounts drove the signal. If the signal persists from different accounts, the campaign has additional infrastructure. Escalate and repeat containment for the new cluster.
Monitoring: inter_account_similarity rolling window, topic_entropy after revocation
An agent breach is the highest-velocity AI incident. An agent operating at machine speed can touch hundreds of customer records, call dozens of APIs, and modify data across multiple systems in seconds. The window between breach and containment determines the blast radius. Fast token revocation through AgentID is the single most important containment action.
CRITICALAgent Breach Playbook
DETECT
AgentID alert: delegation depth exceeded or cross-tenant access attempt
Any cross-tenant access attempt is an immediate incident. Delegation depth exceeded configured maximum indicates a spawned sub-agent hierarchy not anticipated by the policy design, which is consistent with a prompt injection-driven agent hijack.
AgentID audit log: delegation_depth_exceeded=true or cross_tenant_attempt=true
CONTAIN
Revoke ALL active tokens for the compromised agent instance immediately
Use the AgentID Identity Broker to revoke all tokens issued to the affected agent_instance_id. This does not require knowing the scope of the breach. Revoke first, investigate second. Token revocation propagates to the Resource Gateway within seconds, stopping all in-progress agent actions.
Query AgentID audit log to enumerate every resource the agent touched
Run a query on the AgentID audit log for all token_id values issued to the compromised agent_instance_id, then enumerate every gateway enforcement event for each token. This gives the complete list: every API endpoint called, every customer record accessed, every file modified, and every downstream agent spawned.
Assess blast radius: what was touched and by what scope
From the audit log enumeration: count distinct customer records, API endpoints, and data stores accessed. For each, determine whether the agent acted within its capability scope (legitimate but misdirected) or outside it (token was forged or the gateway was bypassed). Out-of-scope actions require immediate notification to affected system owners.
Trace the breach origin: injection, token theft, or delegation manipulation
Cross-reference the breach start time with AgentIQ injection detection logs. If an injection event preceded the out-of-scope actions, the breach originated from prompt injection. If no injection event exists, investigate whether a token was replayed from another session (token_id_hash mismatch) or whether the delegation chain was manipulated by a malicious sub-agent spawn.
AgentIQ: injection events before breach_start_time, AgentID: delegation chain forensics
REMEDIATE
Tighten capability scopes and constraints on the affected workflow
Review the token policy for the affected agent workflow. Tighten the capability scope to the minimum required for the task. Add explicit resource_target constraints where previously absent. Reduce token lifetime to the minimum required for the task. Add a delegation depth maximum constraint.
Run a test session and confirm the gateway blocks the previously-exploited path
Run a controlled test session that attempts the same out-of-scope action that the breached agent took. The AgentID gateway should block it with a scope violation. If the gateway allows the action, the policy update did not correctly tighten the scope. Do not resume production traffic for this workflow until the test passes.
AgentID: test session, expect gateway_rejection on previously-exploited scope
AI forensics uses different artifacts from traditional forensics. There is no memory dump of a compromised process. There is no malware binary. There is no file system modification timestamp. The evidence lives in audit logs from the AI-specific tools in the stack and in the model's behavioural profile captured by adversarial testing.
Mirror Security · AgentID
Identity and Access Audit Log
Every token issuance event, delegation chain record, gateway enforcement decision, scope violation, and revocation event. The primary blast radius artifact for agent breach incidents.
Every document retrieval event with timestamp, namespace, document ID, and access policy result. Reveals which documents were retrieved in a blast radius window without exposing encrypted content.
Per-output classification signals for every model response in a session. Primary injection forensics artifact. Shows the exact timestamp of the first injection detection and the classification for every subsequent output.
Pre- and post-incident model behaviour baselines. Per-category pass rates with timestamps. Correlation ID chains linking scans across the remediation cycle. The only artifact that reveals model compromise.
SHA-256 checksums of model weight artifacts at every pipeline stage. Confirms whether the deployed model matches the expected artifact from the model registry. The only file-level artifact for model compromise investigation.
Log retention for AI incidents. The forensics artifacts above are only useful if they are retained long enough to cover the campaign duration. Distillation campaigns can run for weeks before detection. AgentID and VectaX audit logs should be retained for a minimum of 90 days. DiscoveR scan results should be retained indefinitely as model behaviour baselines. Query hash and cluster logs should be retained for 90 days at minimum.
Section 09
Containment strategies
AI containment uses different tools from traditional containment. You are not blocking a network connection or quarantining a file. You are revoking tokens, rolling back model artifacts, suspending endpoints, or activating encryption controls. Each containment action has a different speed, reversibility, and blast radius impact.
Traditional containment actions
Block a network connection or IP range
Quarantine a malware-infected file or process
Revoke a compromised user credential
Isolate a compromised host from the network
Cannot roll back a model to a previous behaviour state
Cannot assess blast radius from agent audit log
Cannot make harvested data toxic for training
AI-specific containment actions
Token revocation via AgentID: stops all in-progress agent actions within seconds
Model rollback: swap the deployed model to the last verified-clean checkpoint
Endpoint suspension: temporarily disable the affected API endpoint while preserving logs
Document quarantine: remove a poisoned document from the retrieval index
VectaX FHE activation: makes future harvested outputs toxic for training at the latent level
Rate control escalation: enforce enhanced rate limits on affected endpoints
API key cluster revocation: revoke all keys in the implicated account cluster
Token revocation is the fastest containment action in an AI stack. AgentID revocation propagates to the Resource Gateway in seconds. This is why short-lived capability-scoped tokens from E1 matter for incident response: a token that expires in 5 minutes has a maximum blast radius window of 5 minutes even with no active revocation. A shared credential that lasts months requires an active revocation that takes minutes to propagate and risks breaking other workflows that share the credential.
Section 10
Response timeline
The following timeline applies to a high-severity AI incident (critical classification). The T+0 to T+1 hour window is where containment decisions are made. After T+1 hour, the focus shifts to investigation and communication. The verify phase should close before T+72 hours for most incidents.
T+0
DETECT
Alert fires. Page the AI security responder on call. Identify incident type from alert source (AgentIQ, AgentID, DiscoveR, monitoring layer). Open the incident ticket and set severity.
On-call
T+5m
CONTAIN
Execute immediate containment. For agent breach: revoke tokens. For injection: suspend agent instance. For model compromise: initiate rollback to last clean checkpoint. Do not wait for investigation before containment.
On-call + Platform eng
T+15m
INVESTIGATE
Query forensics artifacts. AgentID audit log for blast radius. AgentIQ event log for first injection timestamp. VectaX audit log for retrieval events in the window. Initial blast radius estimate completed.
On-call + Security
T+30m
INVESTIGATE
Notify downstream system owners whose systems appear in the blast radius. Provide the specific resource list from the AgentID audit log. Do not estimate: use the exact list from the audit log.
Security + Legal
T+1h
INVESTIGATE
Root cause identified. Attack path mapped. Briefing delivered to security leadership. Regulatory notification assessment completed (is a GDPR 72-hour window triggered?).
Security lead + Legal
T+4h
REMEDIATE
Fix deployed. Policy update, model rollback, or VectaX activation in place. Affected systems restored to operation where safe to do so. Evidence preserved for forensics before any systems are cleaned.
Platform eng + Security
T+24h
VERIFY
DiscoveR post-fix scan completed. Per-category comparison against baseline confirms attack path is closed. No new regressions introduced by the fix. Incident can be moved to closed status if scan passes.
Security
T+72h
VERIFY
Post-incident review completed. PIR document captures root cause, timeline, blast radius, gaps in monitoring or policy, and hardening actions. Regulatory notification filed if required.
Security lead
Section 11
The remediation cycle
The remediation cycle is the sequence of steps that closes an AI incident with confidence. It uses DiscoveR as the verification tool for each fix. The cycle is not complete until the post-fix scan confirms the attack path is closed and no new regressions exist.
1
Run a baseline DiscoveR scan before any model or policy change
Before deploying any model to production, run a full DiscoveR scan and store the results. This is the baseline that all future comparisons use. If you do not have a pre-incident baseline, run a scan against the last known-good checkpoint and use that as the reference.
DiscoveR baseline scan, all categories
2
Incident detected: identify which categories are failing and at what rate
When an incident is detected and investigated, run a DiscoveR scan against the affected system. The per-category results identify which attack categories are now succeeding that should not be. This scopes the remediation to the specific categories that need fixing.
DiscoveR incident scan, same correlation_id
3
Deploy the fix targeting the specific failing categories
A focused fix targets the failing categories without touching unrelated parts of the system. A broad fix (like a full model rollback) may change pass rates across all categories. Either way, the post-fix scan will show the full picture of what changed.
Policy update, model rollback, or guardrail change
4
Run post-fix DiscoveR scan with the same correlation_id
Run a new DiscoveR scan with the same correlation_id as the baseline and incident scans. The scan results now show: which categories improved (the fix worked for those), which stayed the same (unaffected), and which regressed (the fix introduced new issues in those categories).
DiscoveR post-fix scan, correlation_id=original
5
Close only if: previously failing categories now pass AND no new regressions
The incident is closed when: the categories that were failing at the time of the incident are now passing at or above baseline, AND no other category has regressed below baseline as a result of the fix. If any regression exists, return to step 3 with a more targeted fix.
Incident closure criteria: all categories at or above baseline
Mirror Security · AgentIQ + DiscoveR
AgentIQ catches the incident. DiscoveR closes it.
AgentIQ's inline event stream identifies the first injection event, the affected session, and every output classified as anomalous. DiscoveR's post-fix scan confirms the attack path is closed. Together they cover the detect-to-verify cycle without manual output review.
AI incidents require communication to multiple audiences: security leadership, legal, downstream system owners, and potentially regulators. Each audience needs different information. The template below covers the fields that each communication should contain for a critical AI incident.
One of: Prompt injection / Model compromise / Distillation campaign / Agent breach
Severity
Critical / High / Medium. Specify: is data at risk, is availability affected, is a regulatory notification window triggered?
Detection time
Exact timestamp of first alert. Estimated start time of incident if earlier than detection (relevant for distillation campaigns).
Containment status
What containment action was taken, at what time, and what systems are currently suspended or operating in reduced capacity.
Blast radius
Number of distinct customer records accessed, API endpoints called, data stores modified. Source: AgentID audit log. Do not estimate; use the exact count from the log.
Data exposure
Confirm whether PII, PHI, financial data, or regulated data classes appear in the blast radius. Source: VectaX audit log for retrieval events + AgentIQ PII detection flags in the blast radius window.
Regulatory flag
State whether GDPR Article 33 (72-hour notification window), HIPAA breach notification, or other regulatory obligation is potentially triggered. Legal must assess.
Next update time
Time of the next scheduled update (typically T+1h for critical incidents). State what will be known by then: root cause, full blast radius, remediation plan.
GDPR Article 33 applies to AI incidents that expose personal data. If the blast radius includes personal data of EU individuals, GDPR requires notification to the supervisory authority within 72 hours of becoming aware of the breach. The 72-hour clock starts at T+0 when the alert fires. The VectaX audit log and AgentIQ PII detection flags are the primary evidence that personal data was or was not in the blast radius. Preserve these logs before any remediation steps that might affect log integrity.
Section 13
Post-incident review
The post-incident review (PIR) for an AI security incident covers the same areas as a traditional PIR but adds AI-specific gap analysis. The questions below identify the specific failure points that AI incidents commonly expose.
✓
Was the incident detectable before it was detected?
Review the monitoring logs from before the incident alert fired. Were there earlier signals that crossed a threshold but did not trigger an alert? If yes: the alert threshold was too high. Was the relevant layer being monitored at all? If not: add that monitoring layer before the next incident.
✓
Was a pre-incident DiscoveR baseline available?
If no pre-incident model behaviour baseline existed, the incident response team could not confirm whether the incident caused model drift or whether the model was already drifted before the incident. If no baseline existed: add a mandatory baseline scan to the deployment pipeline before the next incident occurs.
✓
Was the blast radius bounded by short-lived capability-scoped tokens?
If the agent involved used a shared long-lived service account, the blast radius was theoretically unbounded. If it used AgentID capability-scoped tokens with short lifetimes, the blast radius was bounded by the token's scope and expiry. If shared credentials were in use: migrate the affected workflow to AgentID capability tokens before the next incident.
✓
Did VectaX encryption limit the impact of a retrieval layer breach?
If a RAG document was the source of an indirect injection or data exfiltration, and VectaX encryption was not active on that retrieval layer, the attacker had access to plaintext document content. If VectaX was not active: assess whether the affected endpoint should be protected before returning to production.
✓
Was the DiscoveR post-fix scan required before incident closure?
If the incident was closed before a post-fix DiscoveR scan was run, the fix may have introduced new vulnerabilities that were not detected. If the scan was skipped: add it as a mandatory closure gate before the next incident. Closing without verification is the single most common post-incident gap in AI security operations.
✓
Were the required forensics artifacts available and within retention window?
Review which artifacts were needed but missing or out of retention window: AgentID audit log, VectaX retrieval log, AgentIQ event stream, DiscoveR scan history. For each missing artifact: update the retention policy and confirm the logging configuration is correct before the next incident.
How does AI incident response differ from traditional incident response?
Traditional IR deals with systems that have clear compromised or not-compromised states visible in the file system or access log. AI incidents differ in three ways. The blast radius can be invisible without querying the agent audit log: an agent breach may have touched hundreds of records with no file system anomaly. The model itself can be the compromised artifact: a backdoored model looks identical to a clean model at the file level. And remediation requires adversarial verification: a fix that closes the specific attack path may simultaneously open a related one, which only a post-fix adversarial scan can detect.
What are the four most common AI security incidents?
Prompt injection attack: attacker embeds instructions to redirect model behaviour or exfiltrate context window contents. Model compromise: a backdoor or adversarial fine-tuning has changed the model's behaviour, introduced through a supply chain compromise or update pipeline. Distillation campaign: systematic extraction of the model's reasoning capabilities through high-volume coordinated API queries across many accounts. Agent breach: an AI agent has been redirected by injection or token manipulation to take actions outside its authorized scope, potentially touching many downstream systems through valid capability tokens.
What forensics artifacts exist in an AI security incident?
AgentID audit logs record every token issuance, delegation chain, gateway enforcement decision, and revocation event: the primary blast radius artifact. VectaX audit logs record every retrieval event with timestamps, namespace, and document ID without exposing encrypted content. AgentIQ event logs record per-output classification signals: injection detected with type, PII detected, hallucination score, chain security status. DiscoveR scan results provide pre- and post-incident model behaviour baselines. Query hash and cluster logs identify distillation campaign patterns without storing query content. Model weight checksums from the model registry confirm whether the deployed artifact matches the expected one.
How do you contain an agent breach?
Revoke all active AgentID tokens for the affected agent instance through the Identity Broker immediately. Do not wait to investigate before revoking. Token revocation propagates to the Resource Gateway within seconds, stopping all in-progress agent actions. Then query the AgentID audit log to enumerate every tool call, resource accessed, and delegation chain from the compromised agent instance. Assess blast radius from the audit log enumeration. Notify owners of affected downstream systems with the specific resource list from the audit log. Then investigate the breach origin from the forensics artifacts before remediation.
What does the DiscoveR remediation cycle look like?
Run a baseline DiscoveR scan before any model or policy change and store the per-category pass rates. When an incident occurs, run an incident scan with the same correlation_id to see which categories are failing. Deploy the fix targeting the failing categories. Run a post-fix scan with the same correlation_id. The scan now shows which categories improved, which stayed the same, and which regressed as a result of the fix. Close the incident only when: the categories that were failing at incident time are now passing at or above baseline, AND no other category has regressed. If any regression exists, return to remediation with a more targeted fix.
AgentIQ's inline event stream finds the first injection event and maps the blast radius window. AgentID's audit log scopes the blast radius and token revocation contains it in seconds. DiscoveR's correlation_id scan chain confirms the fix worked before you close the incident. All four products are in the playbooks in this module.