A6: RAG Security in ProductionA production RAG system needs three capabilities that modules A1 to A5 do not cover: continuous monitoring of what the system returns, a structured way to test what it can be made to do, and a plan for when something goes wrong. Output monitoring watches every retrieval and generation event for signs of poisoning, drift, and policy violations. It is distinct from access monitoring in A4 which tracks who queried what. Retrieval audit logging writes a structured tamper-evident record of every retrieval event. Each log entry includes request ID, timestamp, user identity, collection queried, query vector hash, result document IDs and scores, RBAC policy applied, and a chained hash for tamper detection. Drift detection tracks when the statistical distribution of retrieved content shifts from baseline. A shift in retrieved document topic distribution or semantic similarity scores is a reliable signal that the vector store has been modified. DiscoveR from Mirror Security automates RAG red teaming across prompt injection, RAG poisoning, namespace boundary failures, embedding inversion, and indirect injection. The risk assessment runs at riskassessment.mirrorsecurity.io. SBOM maintenance in production means updating the AI bill of materials whenever any component changes and checking against the CVE database on schedule. Incident response for RAG breaches has five phases: detection, scoping, isolation, evidence preservation, and recovery. Scoping means identifying affected collections, the first appearance of anomalies in audit logs, and the blast radius of queries that may have received poisoned results. Isolation means taking affected collections offline or switching to read-only without deleting anything. Recovery means rolling back to a known-good snapshot and validating that output monitoring returns to baseline before bringing the collection back online. The production security checklist covers ingestion controls, access control, encryption, output monitoring, red teaming, SBOM, incident response, and governance.PT26MIntermediatetrueen2026-04-04Mirror Academy
Module A6 of 6 · Track 2A: RAG & Vector DB Security · Final Module
Monitor. Red Team. Respond.
RAG Security in Production
Building a secure RAG system is not a one-time event. This module covers what you need to run it securely after deployment: output monitoring, retrieval audit logging, drift detection, structured red teaming with DiscoveR, SBOM maintenance, and an incident response playbook.
Access monitoring from A4 tracks who queried the system and whether they were authorised to do so. Output monitoring is a different question: what is the system actually returning, and does it look right?
A RAG system that has been successfully poisoned will pass every access control check. The user is authorised. The collection is correct. The query is valid. What is wrong is the content of the retrieved documents and the response the LLM generates from them. Output monitoring is the only layer that catches this.
There are four things to watch on every retrieval and generation event.
Signal
What it detects
Severity if triggered
Action
Instruction pattern in retrieved chunk
Text in a retrieved document that reads like a system prompt or instruction override. Consistent indirect prompt injection pattern.
High
Block response. Log full retrieval event. Alert security team.
Namespace or policy mismatch
A retrieved document whose metadata shows it belongs to a different tenant or access tier than the query's namespace.
High
Block response. Trigger namespace audit. Could indicate boundary failure from A2.
Semantic consistency score below threshold
The generated response does not align semantically with the retrieved context. The LLM may be ignoring retrieved content, or retrieved content has been poisoned to steer the response.
Medium
Flag for review. Contribute to drift baseline. Investigate if persistent.
Retrieval volume spike
Query rate exceeds normal baseline by a configured multiplier. Resource exhaustion attack pattern from A2.
Medium
Enforce rate limiting. Log identity. Escalate if sustained.
Known-bad content match
A retrieved chunk matches a hash or pattern from a blocklist of previously identified adversarial content.
import re
from sentence_transformers import SentenceTransformer, util
INSTRUCTION_PATTERNS = [
r"ignore (previous|all) instructions",
r"you are now",
r"new system prompt",
r"disregard (your|the) (system|previous)",
r"act as if",
]
scorer = SentenceTransformer("all-MiniLM-L6-v2")
defcheck_output(retrieved_chunks, generated_response, query_text):
flags = []
# Check each retrieved chunk for instruction patternsfor chunk in retrieved_chunks:
for pattern inINSTRUCTION_PATTERNS:
if re.search(pattern, chunk["content"], re.IGNORECASE):
flags.append({
"type": "instruction_pattern",
"chunk_id": chunk["id"],
"pattern": pattern,
"severity": "high",
})
# Score semantic consistency: response vs retrieved contextcontext = " ".join(c["content"] for c in retrieved_chunks)
emb_ctx, emb_resp = scorer.encode([context, generated_response])
consistency = float(util.cos_sim(emb_ctx, emb_resp))
ifconsistency < 0.45: # tune this threshold per systemflags.append({
"type": "consistency_low",
"score": consistency,
"severity": "medium",
})
returnflags
Threshold tuning matters. A consistency score threshold that is too strict will flag legitimate creative or synthesising responses. A threshold that is too loose will miss subtle poisoning. Establish a baseline using known-good query-response pairs for your specific use case before setting production thresholds. Expect to revisit them after each significant change to the vector store content.
Section 02
Retrieval audit logging
An audit log is what makes forensics possible after an incident. Without it, you cannot answer the most important questions: when did the poisoning start, which queries retrieved the affected documents, and which users received bad output.
A good retrieval audit log entry has enough information to reconstruct what happened at any point in time. It should be append-only, stored separately from the vector database, and protected against modification. Chain hashing links each entry to the previous one so that any deletion or modification is detectable.
Python · Structured retrieval audit log entry with chain hash
import hashlib, json, time, uuid
_prev_hash = "genesis"# initialise at startup; load from last entry in proddefwrite_audit_entry(
user_id, collection, query_vector, results, policy, flags
):
global_prev_hashentry = {
"request_id": str(uuid.uuid4()),
"timestamp": time.time(),
"user_id": user_id,
"collection": collection,
# Never log the raw vector; log a hash of it for correlation only"query_hash": hashlib.sha256(
str(query_vector).encode()
).hexdigest()[:16],
"result_ids": [r["id"] for r inresults],
"result_scores": [round(r["score"], 4) for r inresults],
"policy_applied": policy,
"flags": flags, # output monitor findings for this request"prev_hash": _prev_hash,
}
entry_bytes = json.dumps(entry, sort_keys=True).encode()
entry["entry_hash"] = hashlib.sha256(entry_bytes).hexdigest()
_prev_hash = entry["entry_hash"]
# Write to append-only log store (separate from the vector DB)append_to_log(entry)
returnentry["request_id"]
Do not log raw query vectors. A plaintext query vector can be inverted to recover the approximate original text (see A2: embedding inversion). Log a truncated hash of the vector for correlation purposes. If you need to reproduce the exact query for forensic investigation, retrieve it from the encrypted audit log of the application layer, not the vector database audit log.
Section 03
Drift detection
A RAG poisoning attack changes what the vector store returns for a given query. That change is measurable. Drift detection means running continuous comparisons between the current distribution of retrieval results and a baseline you captured when the system was in a known-good state.
Three metrics are worth tracking in production.
Python · Drift metrics against a known-good baseline
from collections import Counter
import numpy as np
# Capture this baseline when the system ships to productionBASELINE = {
"avg_similarity": 0.78, # mean similarity score across 1000 queries"topic_distribution": { # rough topic breakdown of retrieved content"product": 0.42,
"policy": 0.31,
"support": 0.27,
},
"instruction_rate": 0.002, # fraction of chunks flagged as instruction-like
}
defcompute_drift_score(window_metrics):
scores = []
# 1. Similarity score driftsim_drift = abs(
window_metrics["avg_similarity"] - BASELINE["avg_similarity"]
) / BASELINE["avg_similarity"]
scores.append(sim_drift)
# 2. Topic distribution drift (Jensen-Shannon divergence)base_dist = np.array(list(BASELINE["topic_distribution"].values()))
curr_dist = np.array(list(window_metrics["topic_distribution"].values()))
m = (base_dist + curr_dist) / 2js = (np.sum(base_dist * np.log(base_dist / m + 1e-9) +
curr_dist * np.log(curr_dist / m + 1e-9)) / 2)
scores.append(float(js))
# 3. Instruction-pattern rate driftinstr_drift = (
window_metrics["instruction_rate"] - BASELINE["instruction_rate"]
) / max(BASELINE["instruction_rate"], 1e-6)
scores.append(max(instr_drift, 0))
return np.mean(scores) # alert if > 0.25 over a rolling window
Capture the baseline at ship time, not at build time. A baseline built from development data will not reflect production query patterns. Run the system in a staging environment with realistic traffic for at least 48 hours before establishing the baseline you use for production drift alerting.
Section 04
Red teaming your RAG system
Red teaming means deliberately trying to break your own system using the same attacks documented in A2, before an attacker does. The goal is not to check a compliance box. It is to find out which attacks actually work against your specific configuration so you can fix them before they become incidents.
A structured red team exercise for a RAG system covers six attack categories. Each should be attempted with multiple variations and documented with evidence of success or failure.
01
RAG poisoning
Insert documents designed to manipulate LLM responses on specific query topics. Measure whether poisoned content ranks above legitimate content for target queries. Check whether the output monitor catches it before the response reaches the user.
Direct injection via APIUpload-path bypassChunk-boundary smuggling
02
Indirect prompt injection via document content
Craft documents containing instruction patterns and ingest them into the vector store. Issue queries that retrieve them. Check whether the LLM executes the embedded instructions or whether the output monitor blocks the response first.
From a tenant A identity, attempt to retrieve documents from tenant B's collection. Try metadata filter bypass techniques: null values, type mismatches, wildcard characters in filter fields. Verify that the controls from A4 hold under each variation.
If you have access to stored vectors (as an attacker with database access would), attempt to reconstruct the original text using inversion techniques. Verify that the VectaX encryption from A5 prevents this for protected collections.
Craft adversarial query vectors that retrieve off-topic or adversarial content while appearing semantically similar to legitimate queries. Test whether input validation from A3 catches adversarial query patterns before they reach the retrieval layer.
Query perturbation attacksCross-lingual bypass
06
Resource exhaustion
Send sustained high-volume query loads. Test rate limiting, queue depth controls, and whether the system degrades gracefully under load or allows access controls to be bypassed during a busy period.
Red team in a staging environment that mirrors production exactly. Never run poisoning or boundary-probing tests against a live production vector store. A successful poisoning test in production is an actual poisoning incident. If you cannot maintain a production-equivalent staging environment, use DiscoveR's safe testing mode which runs attack simulations in an isolated sandbox.
Section 05
Automated red teaming with DiscoveR
Running the six attack categories from section 04 manually before every production deployment takes time that most teams do not have. DiscoveR automates this. It runs a structured set of attack scenarios against your RAG endpoint, scores each vulnerability category, and produces a report with evidence of any successful attacks.
DiscoveR covers the OWASP Top 10 for LLMs attack categories relevant to retrieval systems. For each category it generates attack variations, measures which ones succeed, and scores the overall risk posture of the system. The report shows which controls are working, which are bypassed, and what to fix first.
Shell · Run a DiscoveR RAG security assessment (pre-deployment)
# Install the DiscoveR CLI
pip install mirror-discover
# Run a full RAG assessment against your staging endpoint
discover assess \
--target https://your-rag-endpoint/query \
--collection rag_documents \
--auth-header "Authorization: Bearer $STAGING_TOKEN" \
--mode rag \
--output report.json
# The assessment covers:# rag_poisoning -- adversarial document injection tests# prompt_injection -- indirect injection via retrieved content# namespace_boundary -- cross-tenant isolation tests# embedding_inversion -- vector plaintext reconstruction tests# query_poisoning -- adversarial query vector tests# resource_exhaustion -- rate limit and queue depth tests# View the risk score summary
discover report report.json --format summary
Run DiscoveR before every deployment that changes any of these: the vector store content, the embedding model, the retrieval configuration, the LLM, or the access control rules. A change in any one of these can introduce a vulnerability that was not present in the previous version. The assessment completes in under 10 minutes for most RAG configurations.
Mirror Security · DiscoveR
Automated AI red teaming for RAG and LLM systems
Run before every deployment. Covers all six RAG attack categories. Risk report in under 10 minutes.
Modules A3 and A4 covered creating an AI SBOM at build time: cataloguing the embedding model, vector database, LLM, framework dependencies, data sources, and encryption libraries. A build-time SBOM is useful. A production SBOM that is not maintained is a false assurance within weeks.
Production SBOM maintenance has three ongoing tasks.
Update on every component change. When you update the embedding model version, the vector database, the LLM API version, or any framework dependency, update the SBOM before the change reaches production. The SBOM should reflect what is running, not what was planned.
Check against the CVE database on a schedule. New vulnerabilities are published continuously. Set a weekly automated check of all SBOM components against the National Vulnerability Database. Any critical or high CVE in a production component needs a fix or documented mitigation within a defined SLA.
Reconcile the SBOM against what is actually running. Configuration drift means the running system diverges from the planned system over time. A monthly reconciliation compares the SBOM against the deployed versions of every component. Discrepancies are either corrected or documented as intentional exceptions.
Shell · Generate and CVE-check a RAG system SBOM
# Generate a CycloneDX SBOM for your RAG Python environment
pip install cyclonedx-bom
cyclonedx-py environment --output rag-sbom.json --format json
# Add AI-specific components not captured by pip (edit rag-sbom.json):# embedding_model: text-embedding-3-small v2024-11# vector_db: qdrant 1.12.4# llm_api: gpt-4o 2025-02# encryption_lib: mirror-sdk 2.1.0# Check all components against the NVD CVE database
pip install pip-audit
pip-audit --requirement requirements.txt --format json --output audit.json
# Reconcile: compare SBOM against what is installed
pip list --format json > installed.json
python3 reconcile_sbom.py rag-sbom.json installed.json
Section 07
Incident response playbook
A RAG security incident typically means one of three things: a collection has been poisoned with adversarial content, a namespace boundary has been breached and one tenant has accessed another's data, or a prompt injection attack has caused the LLM to take an unintended action. All three follow the same five-phase response.
01
Detection
The trigger is an alert from output monitoring, a DiscoveR finding on a scheduled scan, a user report, or an external disclosure. Record the time of first detection and the source of the alert. Do not start remediation until you have confirmed the incident is real.
Confirm alert is not a false positiveRecord detection timeActivate IR team
02
Scoping
Find the first timestamp in the audit log where the anomaly appears. Identify which collection or collections are affected. Count how many queries retrieved the affected documents between the first appearance timestamp and now. Cross-reference query IDs with user identities to build the blast radius. Check whether the affected content appeared in generated responses by correlating with LLM output logs.
Audit log queryBlast radius mappingAffected user list
03
Isolation
Take the affected collection offline or switch it to read-only mode. Do not delete anything at this stage. Deletion destroys forensic evidence and makes blast radius calculation impossible. Route queries for that collection to a clean fallback or return an error. Notify affected users that the service is temporarily unavailable.
Collection offline or read-onlyNo deletions yetUser notification
04
Evidence preservation
Export the full audit log for the affected collection covering the incident window. Take a snapshot of the affected collection in its current state. Record the versions of all components running at the time of the incident from the SBOM. Hash all evidence files and store them separately from the production system. This package is what you present to regulators, legal, or law enforcement if required.
Audit log exportCollection snapshotComponent version recordEvidence hashing
05
Recovery
Identify the last known-good snapshot of the collection taken before the first anomaly timestamp. Roll back to that snapshot. Re-run ingestion from a clean, validated source for any documents added after the snapshot date. Run a DiscoveR assessment against the restored collection in staging before bringing it back online. Confirm output monitoring returns to baseline metrics. Bring the collection back online and monitor closely for the next 24 hours.
Test the playbook before you need it. Run a tabletop exercise with your team at least once a year. Walk through each phase using a simulated incident. The exercise will reveal gaps in your audit log tooling, in who has access to collection snapshots, and in how quickly the team can compute blast radius from the logs you actually have.
Section 08
Production security checklist
This checklist covers everything built across the six modules in Track 2A. A RAG system that passes all items is in a strong position to defend against the attacks documented in A2. Print it, run it before each production deployment, and keep a dated record of each run.
A3
Ingestion controls
Document validation rejects files that exceed size limits, contain executable content, or fail format checks before they reach the embedding pipeline.
A3: Document ingestion controls
Chunk size limits are enforced. Oversized chunks that could hide adversarial content in padding are rejected or split.
A3: Chunking and preprocessing
The embedding model is pinned to a specific version and its hash is verified on startup. Updates go through the SBOM change process.
A3: Embedding model supply chain
Anomaly detection on insertion events is active and connected to an alert channel.
A3: Anomaly detection on insertion events
A4
Access control
Namespace or collection isolation is enforced at the vector database layer, not only in application logic. A bug in the application cannot cause cross-tenant retrieval.
A4: Multi-tenancy isolation
RBAC policy is enforced using pre-filtering so unauthorised vectors are never retrieved, not just filtered from results after retrieval.
A4: Pre-filtering vs post-filtering
Every request carries a traceable identity. Anonymous queries are not permitted in production.
A4: Identity management
API keys and service credentials are rotated on a defined schedule and after any personnel change in the team with access.
A4: Rotation policies
A5
Encryption
Vectors are encrypted at rest. The encryption key is stored separately from the vector database.
A3, A5: Encrypt-at-embed
All connections to the vector database use TLS 1.3 with certificate pinning where the database supports it.
A1: Encryption in transit
For collections containing regulated data: encrypted inference is configured so the provider infrastructure never holds plaintext during computation.
A5: Encrypted inference
A6
Output monitoring and red teaming
Instruction pattern scanning is active on all retrieved chunks before they enter the LLM prompt. Flagged results are blocked, not just logged.
A6: Output monitoring
Retrieval audit log is append-only, chain-hashed, and stored separately from the vector database. Covers all six required fields.
A6: Retrieval audit logging
Drift detection is running against a production baseline. Alerts are connected to the security team, not just written to a log.
A6: Drift detection
A DiscoveR assessment was run against the current deployment configuration and all findings above medium severity are resolved or documented with an accepted risk decision.
A6: Automated red teaming with DiscoveR
The AI SBOM is current, version-pinned, and was checked against the NVD CVE database within the last 7 days.
A6: SBOM maintenance
An incident response playbook exists, names the people responsible for each phase, and has been tested in a tabletop exercise within the last 12 months.
A6: Incident response playbook
GOV
Governance
The system's risk controls are mapped to the applicable NIST AI RMF functions (Govern, Map, Measure, Manage) and the relevant OWASP Top 10 for LLMs risks.
Track 1: Governance and compliance
A data retention and deletion policy exists for the vector store. Stale data is removed on a schedule, not left indefinitely.
A4: Data governance
A security review is scheduled for every major release, defined as any change to the embedding model, retrieval configuration, LLM, or access control rules.
A6: Pre-deployment DiscoveR assessment
✓
Track 2A complete
You have covered every layer of RAG and vector database security: architecture, attack surface, embedding pipeline, access control, encrypted inference, and production operations. The next tracks build on this foundation.