A5: Encrypted Inference and Vector MemoryThe plaintext gap is the point in a RAG pipeline where encrypted storage and transport end and plaintext processing begins. Access controls from A4 prevent unauthorised access but do not prevent provider infrastructure from seeing data during inference. VectaX closes this gap through three combined layers: FHE for computing similarity on ciphertext, similarity-preserving vector encryption for storage, and RBAC for policy enforcement at decryption time. The client encrypts the query vector with sdk.vectax.encrypt before it leaves the application. The vector database runs similarity search on ciphertext. Results come back encrypted. The application decrypts only those vectors matching the user secret key generated with sdk.rbac.generate_user_secret_key. Vectors outside the user policy scope raise MirrorError on the decrypt call and are excluded from results. Encrypted vector memory stores agent conversation history, system prompts, and retrieved document embeddings in encrypted form so the memory provider never holds plaintext. Every memory entry is encrypted at creation time before reaching any storage system. The VectaX MCP server connects Claude Desktop to an encrypted vector store from the GitHub repository mirrorsecai/mirror-vectax-mcp-server. Compliance: GDPR Article 32 encryption during processing satisfied, HIPAA technical safeguard requirements satisfied, PCI DSS Requirements 3 and 4 satisfied. VectaX works with Qdrant, Pinecone, ChromaDB, MongoDB Atlas, and pgvector. For cryptographic foundations see Track 3D Privacy-Preserving AI.PT22MIntermediatetrueen2026-04-04Mirror Academy
Module A5 of 6 · Track 2A: RAG & Vector DB Security
Encryption in Use
Encrypted Inference & Vector Memory
Access controls stop unauthorised users. Encrypted inference stops the provider infrastructure from seeing the data at all. This module covers how to configure it, use it, and what you can claim for compliance.
The plaintext gap that access controls cannot close
Modules A3 and A4 covered the two layers of protection that most production RAG deployments implement: encrypt data while it is stored and in transit, and control who can access it through RBAC and namespace isolation. Both are necessary. Neither is sufficient on its own.
There is a point in every RAG pipeline where these protections stop. That point is inference. Before the AI model processes a query, the retrieval service decrypts the retrieved vectors and passes them as plaintext into the prompt. The inference server processes plaintext tokens, generates plaintext outputs, and returns them. The provider infrastructure sees everything.
The access controls from A4 are working correctly at this point. The problem is not unauthorised access. The problem is that authorised access still means the AI provider's servers hold your data in plaintext while they work on it.
🔒
Vectors at rest: encrypted (AES-256 or VectaX FHE)
Protected
🔒
Data in transit: encrypted (TLS 1.3)
Protected
👤
Access control: RBAC and namespace isolation from A4
Protected
⚠
Inference: decrypted before processing on provider servers
Query, retrieved vectors, and generated response all in plaintext during computation
Gap
VectaX closes this gap by keeping data encrypted during computation ↓
🔐
Encrypted inference: provider infrastructure never sees plaintext at any stage
Closed
Policy versus cryptography: Most AI providers promise in their terms of service that they will not use your data to train models. That is a contractual commitment. It holds as long as the provider acts in good faith and has no security breaches. Encrypted inference is a mathematical guarantee: the provider infrastructure cannot read your data during processing regardless of policy or breach. The distinction matters for regulated industries where intent is not a substitute for a technical control.
Section 02
How encrypted inference works in a RAG pipeline
Encrypted inference means the similarity computation, the retrieval, and the model processing all happen on encrypted data. The vector database never sees a plaintext embedding. The inference server never sees a plaintext query. Results come back encrypted and are decrypted only at the authorised client.
VectaX makes this work by combining three things that each handle a different part of the problem. FHE (Fully Homomorphic Encryption) enables mathematical operations on ciphertext, so similarity scores can be computed without decryption. Similarity-Preserving Search keeps the geometric relationships between vectors intact after encryption, so nearest-neighbour search still returns correct results. RBAC (Role-Based Access Control) enforces policy at the decryption step, so even a correctly retrieved vector cannot be read by a user whose key does not match the vector's access policy.
The cryptographic details of how FHE, CKKS, bootstrapping, and noise management work are covered in Track 3D: Privacy-Preserving AI. This module focuses on how to configure and use these capabilities in a real RAG system.
What is encrypted at each step in a VectaX RAG pipeline
📝
User query arrives at the application
The application embeds the query locally using its embedding model. Plaintext only at this stage.
Plaintext locally
🔐
Query vector encrypted before leaving the client
sdk.vectax.encrypt() or sdk.rbac.encrypt() wraps the embedding. Nothing leaves the application in plaintext from here.
Encrypted at client
📑
Similarity search runs on ciphertext
The vector database computes distances between encrypted vectors. It never sees plaintext query or stored embeddings.
Provider sees ciphertext only
🔑
RBAC policy applied at decryption
Results return encrypted. The application decrypts only vectors matching the user secret key. Others raise MirrorError and are excluded.
Policy enforced by key
📤
Decrypted results injected into the prompt
Only the authorised client sees plaintext. Provider infrastructure held ciphertext throughout.
Plaintext at authorised client only
Section 03
Encrypting the query vector before it leaves the client
The encryption step happens in the application, before any network call. The embedding is generated locally. The VectaX SDK encrypts it. The encrypted ciphertext is what gets sent to the vector database.
There are two variants. Basic vector encryption wraps a single vector with no access policy attached. RBAC vector encryption attaches a role, group, and department policy to the vector so that only users whose keys satisfy all three dimensions can decrypt it after retrieval.
Python · Encrypting a query vector (basic and RBAC variants)
from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
from mirror_sdk.core.models importVectorData, RBACVectorData
config = MirrorConfig(
api_key="<your_api_key>",
server_url="https://mirrorapi.azure-api.net/v1",
secret="<your_encrypt_secret>",
)
sdk = MirrorSDK(config)
# Option 1: Basic encryption (no access policy)query_vec = VectorData(vector=query_embedding, id="q1")
encrypted_query = sdk.vectax.encrypt(query_vec)
# encrypted_query.ciphertext is what you send to the vector DB# Option 2: RBAC encryption (access policy attached)policy = {
"roles": ["analyst"],
"groups": ["team_finance"],
"departments": ["finance"],
}
rbac_query = RBACVectorData(
vector=query_embedding,
id="q1",
access_policy=policy,
)
encrypted_rbac_query = sdk.rbac.encrypt(rbac_query)
# encrypted_rbac_query.crypto.ciphertext is what you send
Which variant to use: Use RBAC encryption when your RAG system stores documents owned by different roles, teams, or departments. Use basic encryption when you only need to protect the data from the provider infrastructure and access control is handled entirely at the namespace or collection level.
Section 04
Running encrypted similarity search
Once the query is encrypted, the vector database call looks almost identical to a standard call. The ciphertext goes in as the query vector. The database computes similarity, returns results. The difference is that neither the query nor the stored vectors are ever plaintext on the database side.
This works with Qdrant, Pinecone, ChromaDB, MongoDB Atlas Vector Search, and pgvector. The example below uses Qdrant. The pattern is the same for other databases.
Python · Encrypted similarity search with Qdrant
from qdrant_client import QdrantClient
qdrant = QdrantClient() # or cloud endpoint# Send the ciphertext as the query vector# Qdrant computes similarity on ciphertext without decryptingresults = qdrant.query_points(
collection_name="rag_documents",
query=encrypted_rbac_query.crypto.ciphertext,
limit=10,
)
# results.points contains encrypted payloads# The database never saw plaintext at any stage# Decryption happens next in the application
Try it live · VectaX Playground
Run an encrypted similarity search and inspect the ciphertext returned
Section 05
Decrypting only within policy scope
The results from the encrypted search come back as encrypted payloads. The application loops through them and attempts to decrypt each one with the user secret key. If the vector's access policy matches the user's key, decryption succeeds and the result is included. If not, MirrorError is raised and the result is silently excluded.
This means the RBAC enforcement is not in the application code that checks a role claim. It is in the cryptography itself. A user without the matching key cannot decrypt the vector regardless of what the application layer does.
from mirror_sdk.core.models import MirrorCrypto
from mirror_sdk.utils import decode_binary_data
from mirror_sdk.core import MirrorError
# Generate a user secret key scoped to their actual rolesuser_key = sdk.rbac.generate_user_secret_key({
"roles": ["analyst"],
"groups": ["team_finance"],
"departments": ["finance"],
})
accessible = []
for point inresults.points:
try:
meta = decode_binary_data(
point.payload["encrypted_vector_metadata"]
)
mirror_data = MirrorCrypto.deserialize(meta)
sdk.rbac.decrypt(
mirror_data,
point.payload["encrypted_header"],
user_key,
)
# Decryption succeeded: vector is within this user's policy scopeaccessible.append({
"id": point.id,
"content": point.payload["content"],
"score": point.score,
})
except MirrorError:
# Vector exists but this user cannot decrypt it# Silently excluded from resultspass# accessible contains only content the user is allowed to see# Inject into LLM prompt as usual
Safety net property: This module builds on the access controls from A4. If the namespace isolation or metadata filter from A4 fails due to a bug or injection attack, the decryption policy still holds. An attacker who retrieves a ciphertext for which they have no matching key cannot use it. Encrypted inference does not replace access controls. It provides a cryptographic layer underneath them that holds even when the layers above fail.
Section 06
Encrypted vector memory for RAG agents
Agentic RAG systems build up memory as they run. They store conversation history so they can maintain context across turns. They cache retrieved document embeddings to avoid redundant lookups. They write intermediate summaries and task state to memory stores between steps. Without encrypted memory, all of this accumulates as plaintext in whatever storage system the agent uses.
Encrypted vector memory changes this. Every memory entry is encrypted at the moment the agent writes it, before it reaches any storage system. The Similarity-Preserving Search means the agent can still search over its memory by query similarity. The memory provider only ever holds ciphertext.
Standard agent memory
Conversation:"User asked about salary bands..."
System prompt:"You are an HR assistant with access to pay scales..."
Memory provider holds plaintext. Readable by infrastructure, logs, admins.
VectaX encrypted vector memory
Conversation:FHE ciphertext AX7F2Q3R...
System prompt:FHE ciphertext BK3M7P1W...
Retrieved doc:FHE ciphertext QR9X4T2M...
Memory provider holds ciphertext only. No plaintext at any layer.
Section 07
MCP integration with Claude Desktop
The Model Context Protocol (MCP) is a standard for connecting AI models to external data sources. VectaX provides an MCP server that acts as a secure backend for Claude Desktop. When Claude queries your vector store through the VectaX MCP server, all the controls from A3 to A5 are enforced automatically: FHE encryption, RBAC policy checks at decryption time, and cryptographically signed audit logs of every interaction.
Setup takes under five minutes.
Shell · Setup VectaX MCP server (github.com/mirrorsecai/mirror-vectax-mcp-server)
# Clone the MCP server
git clone https://github.com/mirrorsecai/mirror-vectax-mcp-server
# macOS / Linux
chmod +x setup_claude_config.sh
./setup_claude_config.sh
# Windows
.\setup_claude_config.bat
# The script installs mirror-sdk, adds MIRROR_API_KEY to config,# and registers the MCP server with Claude Desktop.# Open Claude Desktop and it can now query your# encrypted vector store with RBAC policies enforced.
Try it live · VectaX Playground
Test encrypted search before setting up the full MCP integration
Section 08
What you can claim for compliance
Encrypted inference changes the compliance picture for AI in regulated industries. The long-standing problem with using third-party AI services for sensitive workloads is that you hand plaintext data to external infrastructure. Encrypted inference removes that objection because the provider processes ciphertext and never holds plaintext at any stage.
These are not marketing claims. They follow from the technical implementation.
GDPR
EU · Personal Data
Article 32 requires appropriate technical measures to protect personal data during processing, not just at rest and in transit.
Encrypted inference satisfies Article 32 processing requirement. The provider infrastructure never holds personal data in plaintext during computation.
HIPAA
US · Health Information
The Security Rule requires encryption of PHI at rest and in transit. AI workloads on PHI must ensure the provider cannot access it.
The AI provider never holds PHI in plaintext at any stage: not during storage, not during retrieval, not during inference. Technical safeguard requirement satisfied.
PCI DSS
Global · Cardholder Data
Requirements 3 and 4 mandate encryption of cardholder data at rest and in transit for systems that process payment data.
Encrypted inference extends this to cover the processing step. AI fraud detection and transaction analysis systems can process cardholder data without exposing it to the inference provider.
One important distinction: Encrypted inference provides cryptographic guarantees about what the provider infrastructure can and cannot access. It does not automatically make your AI system compliant with all aspects of GDPR, HIPAA, or PCI DSS. You still need proper data minimisation, retention policies, audit logging, and access governance. Encrypted inference is one technical control among several required controls.
Section 09
Go deeper on the cryptography
This module covered encrypted inference at the practitioner level: what the plaintext gap is, how to encrypt query vectors, how to run encrypted similarity search, how RBAC decryption works, how encrypted memory is stored, and how to connect everything to Claude via MCP.
If you want to understand what is happening underneath, Track 3D: Privacy-Preserving AI covers the full cryptographic picture: the difference between PHE, SHE, and FHE, how Craig Gentry's bootstrapping breakthrough made unlimited operations possible, how noise accumulates and why it matters, the CKKS scheme for approximate arithmetic on real numbers, open-source FHE libraries (Microsoft SEAL, HElib, Pyfhel), differential privacy, federated learning, and SMPC.
You do not need Track 3D to use VectaX in production. You do need it if you are evaluating whether to build your own FHE implementation, assessing a vendor's cryptographic claims, or working in a role that requires you to explain the technical controls to a regulator or auditor.
Drop-in with Qdrant, Pinecone, ChromaDB, MongoDB, and pgvector. Works with your existing embedding models. GDPR, HIPAA, and PCI DSS ready out of the box.