Module 5: Encrypted Inference and Encrypted Vector MemoryEncrypted inference is the final plaintext gap in AI security. Even systems with encryption at rest and in transit decrypt data at the inference step, exposing inputs, model layer activations, and outputs. FHE has three types: PHE supports one operation type efficiently; SHE supports addition and multiplication up to a noise threshold; FHE supports unlimited operations via Craig Gentry 2009 bootstrapping. VectaX from Mirror Security combines FHE, vector encryption, and RBAC for encrypted inference where inputs and outputs remain encrypted. Noise accumulates with each homomorphic operation and bootstrapping resets it, enabling unlimited computation. The CKKS scheme is preferred for AI because it handles approximate arithmetic on real numbers. Encrypted vector memory stores agent context, conversation history, system prompts, and retrieved embeddings encrypted so the AI provider cannot see plaintext. Encrypted context windows protect the full prompt including retrieved RAG content. The VectaX MCP server integrates with Claude Desktop in under five minutes from the GitHub repository mirrorsecai/mirror-vectax-mcp-server. SMPC allows multiple parties to compute jointly without revealing individual inputs, used in federated learning and data residency scenarios. Open-source FHE libraries include Microsoft SEAL, HElib by IBM, and Pyfhel. GDPR Article 32 requires encryption during processing satisfied definitively by FHE. HIPAA requires encryption of PHI. PCI DSS requires cardholder data encryption. EU AI Act requires documentation of high-risk AI processing. FIPS 140-2 and 140-3 require validated cryptographic modules. SOC 2 Type II requires documented security controls. VectaX is compatible with Pinecone, Qdrant, ChromaDB, MongoDB, and pgvector.PT26MIntermediatetrueen2026-04-03Mirror Academy
Module 5 of 6 · Vector DB & RAG Security · Core Security Path
Encryption in Use
Encrypted Inference & Encrypted Vector Memory
FHE types, how VectaX closes the last plaintext gap, encrypted agent memory, MCP integration with Claude, and what this means for GDPR, HIPAA, and the EU AI Act.
Modules 3 and 4 secured data at ingestion, in storage, and at access control. But there is one step in every AI system that still operates entirely on plaintext, even in well-secured deployments: inference. The moment when the AI model processes inputs, activates layers, and generates outputs.
Even if your vector database encrypts embeddings at rest, the retrieval service decrypts them before feeding them into the LLM prompt. Even if your pipeline uses TLS throughout, the AI provider's inference server processes plaintext tokens. Even if your access controls are airtight, the model weights and layer activations operate on unencrypted data inside the provider's infrastructure.
This is the last plaintext gap. An attacker who compromises the inference server, or an AI provider whose security posture you must accept on trust, can see every query, every retrieved document, and every generated response.
Where data is exposed in a standard AI pipeline
🔒
Storage: encrypted at rest (AES-256)
Protected
🔒
Transmission: encrypted in transit (TLS 1.3)
Protected
⚠
Inference: decrypted, processed in plaintext on provider servers
Inputs, model activations, and outputs all in plaintext during computation
Exposed
With VectaX FHE, the inference step also becomes encrypted ↓
🔐
Inference: encrypted throughout via VectaX FHE. Provider never sees plaintext.
Protected
Trust versus proof: Most AI providers publish SOC 2 reports and data processing agreements promising they will not use your data to train models. These are contractual commitments. They depend on the provider acting in good faith and having no security breaches. FHE is a mathematical guarantee: the provider's infrastructure cannot read your data during inference, regardless of policy or breach status. Mirror Security describes this as replacing policy-based trust with cryptographic proof.
Section 02 · Cryptography
PHE, SHE, and FHE: the three types of homomorphic encryption
Homomorphic encryption (HE) allows computation directly on encrypted data. The result of the computation, when decrypted, matches what you would have gotten by computing on the plaintext. The server performing the computation never sees the actual values. There are three generations of this technology, each with different capabilities and performance trade-offs.
Best for: secure aggregations, voting systems, simple analytics
Generation 2
SHE
Somewhat Homomorphic Encryption
✓ Both addition and multiplication
⚠ Limited depth before noise threshold
✓ More flexible than PHE
Noise accumulates with each operation and cannot be refreshed.
Best for: simple ML inference, fixed-depth neural networks
Generation 3
FHE
Fully Homomorphic Encryption
✓ Unlimited addition and multiplication
✓ Bootstrapping refreshes noise
✓ Can compute any function on encrypted data
First practical construction by Craig Gentry, 2009. Schemes: BFV, BGV, CKKS.
Best for: full AI inference, complex ML models, VectaX
Why CKKS for AI: The CKKS scheme (Cheon-Kim-Kim-Song) is designed for approximate arithmetic on real numbers, which is exactly what neural network inference requires. BFV and BGV work with integers. CKKS works with floating-point-like values, making it the right choice for embedding operations and vector similarity computations. VectaX uses CKKS-based schemes for all AI workloads.
Section 03 · How FHE Works
Noise and bootstrapping: why FHE is now practical
Every FHE ciphertext contains a small amount of random noise added during encryption. This noise is what makes the scheme secure: without it, an attacker could solve for the plaintext from the ciphertext. But noise grows with each operation. Add two ciphertexts and you get a little more noise. Multiply them and noise roughly squares. Keep operating and eventually the noise is so large the ciphertext cannot be correctly decrypted.
Craig Gentry's 2009 breakthrough introduced bootstrapping: a procedure that takes a noisy ciphertext and produces a fresh ciphertext with much less noise, representing the same plaintext value. Bootstrapping is itself a homomorphic computation, which makes it mathematically remarkable. By periodically refreshing ciphertexts, you can chain unlimited operations, enabling full neural network inference on encrypted data.
Noise accumulation per operation type and bootstrapping reset
After encryption
Low
Decryption fails →
Decryptable
+ 5 additions
Still low
Decryptable
+ 3 multiplications
Growing fast
Caution
Deep circuit (SHE limit)
Near threshold
SHE stops here
Bootstrapping ↻
Reset!
FHE continues
Bootstrapping is what separates SHE from FHE. Without it, operations stop at the noise threshold. With it, the computation continues indefinitely, enabling complete neural network inference on encrypted data.
VectaX provides noise control parameters through the SDK. Teams can tune how much noise buffer to maintain before bootstrapping triggers. A tighter buffer means more frequent bootstrapping, more compute, but higher accuracy. A looser buffer means fewer bootstrapping operations and lower compute cost but slightly less floating-point precision. For embedding similarity search, the precision requirements are well understood, and VectaX's defaults are calibrated for high accuracy at practical latency.
Try it live · VectaX Playground
Run encrypted operations and observe the accuracy versus latency trade-off
Section 04 · VectaX
VectaX encrypted inference: three layers working together
VectaX encrypted inference is not FHE applied to vectors in isolation. It combines three components: FHE for the encryption scheme, vector encryption for similarity-preserving storage, and RBAC for policy-controlled access. All three are necessary. FHE alone lets you compute on encrypted data but does not preserve the similarity relationships needed for vector search. Vector encryption alone protects stored embeddings but not the inference computation. RBAC alone controls access but does not prevent extraction attacks through repeated queries.
VectaX encrypted inference: what is encrypted at each step
📝
User query
User submits a query. The client embeds it locally using its own embedding model.
Plaintext locally
🔐
Encrypt query vector
sdk.vectax.encrypt() applied to the query embedding before it leaves the client.
Encrypted at client
📑
Encrypted similarity search
The encrypted query is compared against encrypted stored vectors. Distance computed on ciphertext. The vector DB never sees plaintext query or plaintext embeddings.
Encrypted throughout
🔑
RBAC policy check
Only vectors matching the user's decryption key are returned. Vectors outside the user's policy scope remain undecryptable ciphertext.
Policy enforced cryptographically
🧠
Encrypted context injection
Retrieved encrypted context is injected into the LLM prompt. With full FHE inference, the model processes encrypted tokens throughout.
FHE inference option
📤
Encrypted response
Response generated and returned encrypted. Decrypted only by the authorised client. Provider infrastructure never held plaintext at any stage.
from mirror_sdk.core.models importRBACVectorData, MirrorCrypto
from mirror_sdk.utils import decode_binary_data
from mirror_sdk.core import MirrorError
# 1. Encrypt query vector with the user's access policypolicy = {"roles": ["analyst"], "groups": ["team_finance"], "departments": ["finance"]}
query_data = RBACVectorData(vector=query_embedding, id="query", access_policy=policy)
encrypted_query = sdk.rbac.encrypt(query_data)
# 2. Search on encrypted vectors - vector DB never sees plaintextresults = qdrant.query_points(
collection_name="vectax",
query=encrypted_query.crypto.ciphertext,
limit=10
)
# 3. Decrypt only results within the user's policy scopeaccessible = []
for point inresults.points:
try:
meta = decode_binary_data(point.payload["encrypted_vector_metadata"])
mirror_data = MirrorCrypto.deserialize(meta)
decrypted = sdk.rbac.decrypt(
mirror_data,
point.payload["encrypted_header"],
user_key,
)
accessible.append({"id": point.id, "content": point.payload["content"]})
except MirrorError:
pass# vector outside user's policy scope - access denied
Try it live · VectaX Playground
Test encrypted search with RBAC decryption and see which vectors each role can access
Section 05 · Agent Memory
Encrypted vector memory: what agents remember and who can read it
AI agents build up memory as they work. They store conversation history to maintain context across turns. They cache retrieved document embeddings to avoid redundant lookups. They write summaries, intermediate reasoning states, and task progress to memory stores. In most deployments, all of this is stored in plaintext or with only standard database encryption at rest.
Encrypted vector memory changes this. Every memory entry stored by the agent is encrypted at the moment of creation, before it is written to any storage system. The agent can later retrieve and reason over its memory because VectaX's Similarity-Preserving Search keeps the embeddings searchable. But the memory store itself, and any infrastructure between the agent and the store, never holds plaintext.
Standard agent memory (plaintext)
Conversation:"User asked about Q3 salary bands. Retrieved HR policy doc..."
System prompt:"You are an HR assistant with access to confidential pay scales..."
Visible to provider infrastructure, logs, and memory DB admins
VectaX encrypted agent memory
Conversation:⊕ AX7F2Q3R9K... [FHE ciphertext]
System prompt:⊕ BK3M7P1W4N... [FHE ciphertext]
Retrieved docs:⊕ QR9X4T2M8V... [FHE ciphertext]
Provider infrastructure sees only ciphertext. No plaintext at any layer.
This matters most for long-running agents that accumulate sensitive information over many interactions. Without encrypted memory, the memory store becomes a consolidated plaintext record of everything sensitive the agent has ever retrieved or been told. With encrypted memory, even if the memory store is breached, the attacker sees encrypted blobs that cannot be reversed without the decryption key.
Section 06 · Context Security
Encrypted context windows and conversation history
The context window is the most sensitive surface in a RAG system at inference time. It contains the system prompt (which may include confidential instructions), the retrieved documents (the sensitive content you went to all this trouble to protect), the conversation history (which may contain personal information shared by the user), and the user's current query.
In a standard RAG deployment, all of this travels to the AI provider's inference server in plaintext. The provider can technically read every element of the context window, even if their policy says they do not.
VectaX encrypted context windows keep retrieved document embeddings and conversation history entries in encrypted form until the last possible moment. Decryption happens only at an authorised endpoint. This is particularly valuable when the system prompt contains proprietary instructions that the organisation does not want to expose to the model provider's infrastructure.
MCP and encrypted context: The Model Context Protocol (MCP) standardises how AI models like Claude connect to external data sources. VectaX's MCP server integration means that when Claude queries your vector store through MCP, the connection is secured with VectaX encryption and RBAC policies are enforced at query time. See Section 10 for the full MCP setup.
Section 07 · Model Security
Secure model deployment: protecting weights and controlling predictions
Encrypted inference protects the input data. But there is a second concern: protecting the model itself. Model weights represent significant intellectual property. An organisation that fine-tunes a foundation model on proprietary data has built something valuable. Exposing those weights to provider infrastructure, or to users through model extraction attacks, is a real risk.
VectaX addresses model security through two mechanisms. First, encrypted inference means model weights can be deployed to infrastructure that never processes plaintext inputs or outputs, reducing information available for extraction attacks. Second, RBAC on predictions means access to specific model endpoints can be controlled at the same role, group, and department level as data access.
🛡
Protected model weights
Model weights encrypted at rest and in transit. Fine-tuned weights never exposed to inference infrastructure in plaintext.
🔒
Secure inference pipeline
Inputs encrypted before reaching the inference server. Outputs encrypted before leaving. Provider sees only ciphertext at every stage.
👥
Access-controlled predictions
RBAC applied to model endpoints. An analyst role cannot call a prediction endpoint restricted to data scientists. Predictions scoped by user policy.
Model extraction attacks work by querying a model repeatedly with carefully chosen inputs to reconstruct its decision boundaries. Encrypted inference makes this harder because the attacker sees encrypted outputs that must be decrypted before they can be used to train a surrogate model. Combined with rate limiting and query pattern monitoring from Module 6, this significantly raises the cost of extraction attacks.
Section 08 · Multi-Party
Secure multiparty computation: AI across organisational boundaries
Sometimes the most valuable RAG systems are built on data from multiple organisations. A hospital network wants to build a diagnostic AI using patient records from 12 hospitals. A financial consortium wants to detect fraud patterns across member banks. In both cases, each organisation has data too sensitive to share with the others, but the combined dataset would produce a much better model.
Secure Multiparty Computation (SMPC) solves this. It allows multiple parties to jointly compute a function over their combined inputs without any party revealing their individual data to the others. Each party learns only the final result of the computation, not the inputs from other participants.
SMPC for federated AI: hospitals without sharing patient data
🏥
Hospital A
Local patient data (encrypted)
+
🏥
Hospital B
Local patient data (encrypted)
+
🏥
Hospital C
Local patient data (encrypted)
→
🧠
Shared SMPC result
Better model. No raw data shared.
Each hospital computes locally on their encrypted data. The SMPC protocol combines results mathematically so the final output is equivalent to training on all datasets combined, without any hospital ever seeing another hospital's patient records.
SMPC and FHE complement each other. FHE allows a single party to process encrypted data without decrypting it. SMPC allows multiple parties to combine their data without sharing it. Used together, as the Cisco 2024 white paper on securing vector databases notes, FHE can reduce the communication overhead typically associated with SMPC, because homomorphically encrypted data can be processed by a single party without constant back-and-forth between participants.
Section 09 · Tools
Open-source FHE libraries
For teams evaluating whether to build their own FHE implementation or use a solution like VectaX, understanding the open-source library landscape is useful. These libraries provide the cryptographic primitives. They do not provide AI-optimised implementation, noise control tuning, RBAC integration, or production deployment tooling.
SEAL
Microsoft Research
Schemes: BFV, CKKS
C++ with Python wrapper
The most widely adopted FHE library in industry and research. Excellent documentation, active maintenance, and GPU acceleration support. CKKS implementation is well-suited for approximate arithmetic on real numbers. Used as a building block in commercial FHE products.
HElib
IBM Research
Schemes: BGV, CKKS
C++
One of the oldest and most battle-tested FHE libraries. Strong performance on BGV, which is optimised for integer arithmetic. IBM has used HElib in production research for financial and healthcare applications. Steeper learning curve than SEAL but highly optimised for specific workloads.
Pyfhel
Open-source community
Schemes: BFV, BGV, CKKS (via SEAL/HElib)
Python
Python wrapper making FHE accessible without C++ expertise. Useful for prototyping and learning. Performance is lower than direct C++ usage. Not recommended for production AI workloads where latency matters. Good starting point before committing to an optimised implementation.
Build versus buy for AI FHE: Using a general-purpose FHE library for AI inference requires significant expertise in noise management, parameter selection, and performance tuning for neural network operations. VectaX packages this work with AI-specific optimisations, SDK-level noise control, RBAC integration, and production deployment tooling. The Mirror Security and SiSys AI collaboration on the MIRROR co-processor will further accelerate FHE operations in hardware for hyperscale deployments.
Section 10 · MCP
MCP integration: VectaX as a secure vector backend for Claude
The Model Context Protocol (MCP) is a standard that lets AI models like Claude connect to external data sources through a defined interface. It is sometimes described as the "USB-C port for AI": a universal connector that any MCP-compatible tool can plug into any MCP-compatible model without custom integration work.
VectaX provides an MCP server that acts as a secure vector database backend. When Claude Desktop connects to the VectaX MCP server, it gains access to semantic search over your vector store with all VectaX security controls enforced: similarity-preserving FHE encryption, RBAC at role, group, and department level, audit logging of all interactions, and compliance-ready monitoring.
VectaX MCP integration architecture
🤖
Claude Desktop
MCP Host. Sends semantic search queries via the MCP protocol.
Qdrant, Pinecone, ChromaDB, MongoDB, or pgvector. Stores only encrypted embeddings.
Encrypted storage
→
📊
Audit & Compliance
Cryptographically-signed logs of all AI data interactions. SOC 2, HIPAA, PCI DSS ready.
Compliance trail
Shell · VectaX MCP server installation (github.com/mirrorsecai/mirror-vectax-mcp-server)
# Clone the VectaX MCP server
git clone https://github.com/mirrorsecai/mirror-vectax-mcp-server
# macOS / Linux automated setup
chmod +x setup_claude_config.sh
./setup_claude_config.sh
# Windows automated setup
.\setup_claude_config.bat
# The script handles everything automatically:# - Installs dependencies (pip install mirror-sdk)# - Configures VectaX with your MIRROR_API_KEY# - Registers the MCP server with Claude Desktop# - Applies security best practices (TLS, key rotation, audit logging)# After setup: open Claude Desktop# Claude now queries your VectaX vector store via MCP# with all RBAC policies enforced at query time
Try it live · VectaX Playground
Test VectaX encrypted search before setting up the full MCP integration
Section 11 · Compliance
Compliance implications: what encrypted inference does for regulated industries
Encrypted inference changes the compliance conversation for AI in regulated industries. The fundamental problem with using third-party AI services for sensitive workloads has always been that you are handing plaintext data to an external infrastructure provider. FHE removes this objection. The provider's infrastructure processes ciphertext, not your data.
GDPR
EU · Personal Data
Article 32 requires appropriate technical measures to protect personal data during processing. The EDPB has confirmed that encryption during processing satisfies this. Article 25 (data protection by design) requires building privacy into the system from the start.
FHE provides cryptographic proof that personal data was never processed in plaintext by the provider. Satisfies Articles 25 and 32 definitively, not on a contractual basis.
HIPAA
US · Health Information
The Security Rule requires encryption of PHI at rest and in transit. AI workloads processing patient records must ensure the AI provider cannot access PHI. Business Associate Agreements are required but insufficient on their own.
Encrypted inference means the AI provider never holds PHI in plaintext at any stage. This satisfies the technical safeguard requirements and significantly reduces HIPAA risk for AI healthcare applications.
PCI DSS
Global · Payment Data
Requirement 3 mandates encryption of stored cardholder data. Requirement 4 mandates encryption in transit. AI systems processing payment data for fraud detection must ensure cardholder data is not exposed to inference provider infrastructure.
Encrypted inference enables fraud detection and transaction AI on cardholder data without exposing it to the inference provider, satisfying Requirements 3 and 4 for AI processing pipelines.
EU AI Act
EU · High-Risk AI Systems
High-risk AI systems (HR, credit scoring, healthcare, biometrics) must implement risk management systems, maintain technical documentation, and ensure data governance measures that minimise data quality and security risks.
FHE provides a documented technical control for the required technical documentation. VectaX audit logs satisfy the traceability and monitoring requirements for high-risk AI systems.
FIPS 140-2 / 140-3
US Government · Cryptographic Modules
US federal systems must use FIPS-validated cryptographic modules. Government AI systems processing classified or sensitive data must use validated implementations of approved cryptographic algorithms.
VectaX uses FIPS-compliant cryptographic algorithms and is designed for government and regulated industry deployments, supporting the FIPS requirement for validated cryptographic modules in government AI systems.
SOC 2 Type II
Global · Service Organisations
Requires documented security controls for systems processing customer data, with third-party audit validation that controls operate effectively over time. Access controls, encryption, logging, and incident response must be documented and tested.
VectaX provides cryptographically-signed audit logs, RBAC with documented policy enforcement, and encryption at every stage. Mirror Security holds SOC 2 Type I certification with Type II in progress.
Cryptographic guarantees versus policy-based compliance: Most AI compliance approaches rely on contractual commitments: "we will not use your data to train models," "we will delete your data after 30 days." These depend on the provider acting in good faith and having no security breaches. FHE is a mathematical guarantee: the provider's infrastructure is cryptographically incapable of reading your data, regardless of policy or breach status. For regulated industries, the difference between a contractual assurance and a cryptographic proof is significant.
Mirror Security · VectaX
Cryptographic proof your AI provider cannot read your data
FHE-optimised for AI workloads. Encrypted inference, encrypted vector memory, MCP integration with Claude Desktop. GDPR, HIPAA, PCI DSS, and SOC 2 ready. Drop-in with Pinecone, Qdrant, ChromaDB, MongoDB, and pgvector.