Question 1

What is output monitoring for a RAG system?

Accepted Answer

Output monitoring means watching what the RAG system returns on every request, not just whether it returns a response. It covers four things: checking that retrieved documents match the query namespace and access policy, flagging retrieved content that contains instruction patterns (a prompt injection signal), scoring the semantic consistency of the generated response against the retrieved context (a drift or poisoning signal), and logging everything with enough structure that forensics is possible after an incident. Output monitoring is different from access monitoring, which tracks who queried what. Output monitoring tracks what came out.

Question 2

What should a retrieval audit log contain?

Accepted Answer

A retrieval audit log entry should contain the request ID, timestamp, user or service identity, the collection or namespace queried, the query vector hash (not the plaintext vector), the number of results returned, the IDs and scores of retrieved documents, any RBAC policy applied, the response generation time, and a tamper-evident hash of the entry itself. The log should be append-only and stored separately from the vector database so that a compromise of the vector store does not also compromise the audit trail. Audit logs are what make incident forensics possible after a RAG breach.

Question 3

What is drift detection for a RAG system and why does it matter for security?

Accepted Answer

Drift detection in a RAG security context means tracking when the statistical properties of retrieved content or generated responses change from a known-good baseline. A sudden shift in the distribution of retrieved document topics, a change in the average semantic similarity score between queries and results, or a new pattern of instruction-like content appearing in retrieved chunks are all signals that the vector store may have been poisoned. Drift is the observable consequence of a successful RAG poisoning attack. Without drift detection, a poisoning attack can run undetected for days or weeks.

Question 4

What does AI red teaming for a RAG system involve?

Accepted Answer

AI red teaming for a RAG system means systematically attempting the attacks documented in module A2 against your own system before an attacker does. This includes RAG poisoning attempts by inserting adversarial documents, embedding inversion tests to check whether vectors are retrievable as plaintext, namespace boundary probing to test cross-tenant isolation, indirect prompt injection via crafted document content, query poisoning to test whether adversarial queries can manipulate retrieval ranking, and resource exhaustion tests to check rate limiting and queue depth controls. Red teaming should be scheduled, documented, and tracked against a baseline so improvements can be measured.

Question 5

What is DiscoveR and how does it help with RAG red teaming?

Accepted Answer

DiscoveR is Mirror Security's automated AI red teaming tool. For RAG systems it runs a structured set of attack scenarios covering the OWASP Top 10 for LLMs attack categories relevant to retrieval systems: prompt injection via retrieved content, RAG poisoning, namespace boundary failures, and embedding inversion. DiscoveR generates a risk report scoring each vulnerability category and providing evidence of successful attacks where they are found. The risk assessment is available at riskassessment.mirrorsecurity.io and can be run before each production deployment.

Question 6

What is an AI SBOM and why does it need maintenance in production?

Accepted Answer

An AI SBOM (Software Bill of Materials) for a RAG system catalogues every component: the embedding model and its version, the vector database and version, the LLM and API version, all framework dependencies, the data sources feeding the vector store, and the encryption library versions. Modules A3 and A4 covered creating the SBOM at build time. Maintaining it in production means updating it whenever any component changes, checking it against the CVE database on a schedule, and confirming it matches what is actually running rather than what was planned. A production SBOM that is not maintained becomes a false assurance within weeks.

Question 7

What are the steps in a RAG incident response playbook?

Accepted Answer

A RAG incident response playbook has five phases. Detection: the trigger is an alert from output monitoring, a red team finding, or an external report. Scoping: identify which collection or collections are affected, when the anomaly first appeared in audit logs, and how many queries may have seen poisoned results. Isolation: take the affected collection offline or switch to read-only mode. Do not delete anything yet. Evidence preservation: export audit logs and vector store snapshots before any changes. Recovery: roll back to the last known-good snapshot, re-run ingestion from a clean source, and validate output monitoring returns to baseline before bringing the collection back online. Post-incident: document the timeline, root cause, and control gaps.

Question 8

How do you identify the blast radius of a RAG poisoning attack?

Accepted Answer

Blast radius identification starts in the audit log. Find the first timestamp where the poisoned vectors appear in retrieval results, then count how many unique queries retrieved them between that timestamp and when the collection was isolated. Cross-reference those query IDs with user or service identities to understand who received poisoned content. Check whether the poisoned chunks appeared in generated responses by correlating query IDs with LLM output logs. The blast radius is the set of users, services, and downstream systems that may have acted on the compromised output.

Question 9

How often should you red team a production RAG system?

Accepted Answer

Red teaming should happen before each production deployment that changes the vector store, the embedding model, the retrieval configuration, or the LLM. It should also run on a scheduled basis regardless of changes, because the threat landscape changes even when the system does not. For high-risk RAG deployments handling regulated data, a monthly automated run with DiscoveR plus a quarterly manual red team exercise is a reasonable starting point. The output of each run should be compared against the previous run to track whether the risk posture is improving or degrading.

Question 10

What is the difference between output monitoring and hallucination detection?

Accepted Answer

Hallucination detection checks whether the generated response is factually grounded in the retrieved context. This is a quality concern. Output monitoring for security checks whether the retrieved content or generated response contains signals of attack: instruction patterns in retrieved chunks (prompt injection), content that does not belong in the namespace (boundary failure or poisoning), or a generated response that diverges from what the retrieved context should produce. Both are useful in production but they are separate systems serving different purposes.

Question 11

What should trigger an automatic alert in a production RAG system?

Accepted Answer

Automatic alerts should fire on five conditions: a retrieved chunk that contains instruction patterns above a confidence threshold; a semantic similarity score between query and retrieved content below a configured minimum (indicating retrieval of off-topic content); a namespace or collection access by an identity that does not match the expected access pattern; a retrieval volume spike that exceeds normal query rate by a configurable multiplier (resource exhaustion signal); and a drift score above threshold comparing the current distribution of retrieved topics to the rolling baseline. All five conditions should write to the audit log regardless of whether they trigger an alert.

Question 12

What does a production-ready RAG security checklist cover?

Accepted Answer

A production-ready RAG security checklist covers eight areas: ingestion controls (document validation, chunking limits, supply chain checks from A3), access control (namespace isolation, RBAC enforcement, identity traceability from A4), encryption (at rest, in transit, and in use from A5), output monitoring (retrieval audit logging, drift detection, instruction pattern scanning), red teaming (pre-deployment DiscoveR scan, documented findings, tracked remediation), SBOM (current, version-pinned, checked against CVE database), incident response (playbook documented, tested, contact list current), and governance (NIST AI RMF mapping, data retention policy, review schedule).

Signal	What it detects	Severity if triggered	Action
Instruction pattern in retrieved chunk	Text in a retrieved document that reads like a system prompt or instruction override. Consistent indirect prompt injection pattern.	High	Block response. Log full retrieval event. Alert security team.
Namespace or policy mismatch	A retrieved document whose metadata shows it belongs to a different tenant or access tier than the query's namespace.	High	Block response. Trigger namespace audit. Could indicate boundary failure from A2.
Semantic consistency score below threshold	The generated response does not align semantically with the retrieved context. The LLM may be ignoring retrieved content, or retrieved content has been poisoned to steer the response.	Medium	Flag for review. Contribute to drift baseline. Investigate if persistent.
Retrieval volume spike	Query rate exceeds normal baseline by a configured multiplier. Resource exhaustion attack pattern from A2.	Medium	Enforce rate limiting. Log identity. Escalate if sustained.
Known-bad content match	A retrieved chunk matches a hash or pattern from a blocklist of previously identified adversarial content.	High	Block response immediately. Trigger collection audit.

RAG Security
in Production

Output monitoring

Retrieval audit logging

Drift detection

Red teaming your RAG system

Automated red teaming with DiscoveR

Automated AI red teaming for RAG and LLM systems

SBOM maintenance in production

Incident response playbook

Production security checklist

Track 2A complete

RAG Securityin Production