Module 5 of 6 · Vector DB & RAG Security · Core Security Path

Encryption in Use

Encrypted Inference
& Encrypted
Vector Memory

FHE types, how VectaX closes the last plaintext gap, encrypted agent memory, MCP integration with Claude, and what this means for GDPR, HIPAA, and the EU AI Act.

26 min read
Core Security
Intermediate

Module Progress

1 2 3 4 5 6

Section 01 · The Problem

The last plaintext gap

Modules 3 and 4 secured data at ingestion, in storage, and at access control. But there is one step in every AI system that still operates entirely on plaintext, even in well-secured deployments: inference. The moment when the AI model processes inputs, activates layers, and generates outputs.

Even if your vector database encrypts embeddings at rest, the retrieval service decrypts them before feeding them into the LLM prompt. Even if your pipeline uses TLS throughout, the AI provider's inference server processes plaintext tokens. Even if your access controls are airtight, the model weights and layer activations operate on unencrypted data inside the provider's infrastructure.

This is the last plaintext gap. An attacker who compromises the inference server, or an AI provider whose security posture you must accept on trust, can see every query, every retrieved document, and every generated response.

Where data is exposed in a standard AI pipeline

🔒
Storage: encrypted at rest (AES-256)
Protected
🔒
Transmission: encrypted in transit (TLS 1.3)
Protected
Inference: decrypted, processed in plaintext on provider servers
Inputs, model activations, and outputs all in plaintext during computation
Exposed
With VectaX FHE, the inference step also becomes encrypted ↓
🔐
Inference: encrypted throughout via VectaX FHE. Provider never sees plaintext.
Protected

Trust versus proof: Most AI providers publish SOC 2 reports and data processing agreements promising they will not use your data to train models. These are contractual commitments. They depend on the provider acting in good faith and having no security breaches. FHE is a mathematical guarantee: the provider's infrastructure cannot read your data during inference, regardless of policy or breach status. Mirror Security describes this as replacing policy-based trust with cryptographic proof.

Section 02 · Cryptography

PHE, SHE, and FHE: the three types of homomorphic encryption

Homomorphic encryption (HE) allows computation directly on encrypted data. The result of the computation, when decrypted, matches what you would have gotten by computing on the plaintext. The server performing the computation never sees the actual values. There are three generations of this technology, each with different capabilities and performance trade-offs.

Generation 1
PHE
Partially Homomorphic Encryption
One operation type only
Cannot combine addition + multiplication
Fast and efficient
Examples: RSA (multiplication), Paillier (addition)
Best for: secure aggregations, voting systems, simple analytics
Generation 2
SHE
Somewhat Homomorphic Encryption
Both addition and multiplication
Limited depth before noise threshold
More flexible than PHE
Noise accumulates with each operation and cannot be refreshed.
Best for: simple ML inference, fixed-depth neural networks
Generation 3
FHE
Fully Homomorphic Encryption
Unlimited addition and multiplication
Bootstrapping refreshes noise
Can compute any function on encrypted data
First practical construction by Craig Gentry, 2009. Schemes: BFV, BGV, CKKS.
Best for: full AI inference, complex ML models, VectaX

Why CKKS for AI: The CKKS scheme (Cheon-Kim-Kim-Song) is designed for approximate arithmetic on real numbers, which is exactly what neural network inference requires. BFV and BGV work with integers. CKKS works with floating-point-like values, making it the right choice for embedding operations and vector similarity computations. VectaX uses CKKS-based schemes for all AI workloads.

Section 03 · How FHE Works

Noise and bootstrapping: why FHE is now practical

Every FHE ciphertext contains a small amount of random noise added during encryption. This noise is what makes the scheme secure: without it, an attacker could solve for the plaintext from the ciphertext. But noise grows with each operation. Add two ciphertexts and you get a little more noise. Multiply them and noise roughly squares. Keep operating and eventually the noise is so large the ciphertext cannot be correctly decrypted.

Craig Gentry's 2009 breakthrough introduced bootstrapping: a procedure that takes a noisy ciphertext and produces a fresh ciphertext with much less noise, representing the same plaintext value. Bootstrapping is itself a homomorphic computation, which makes it mathematically remarkable. By periodically refreshing ciphertexts, you can chain unlimited operations, enabling full neural network inference on encrypted data.

VectaX provides noise control parameters through the SDK. Teams can tune how much noise buffer to maintain before bootstrapping triggers. A tighter buffer means more frequent bootstrapping, more compute, but higher accuracy. A looser buffer means fewer bootstrapping operations and lower compute cost but slightly less floating-point precision. For embedding similarity search, the precision requirements are well understood, and VectaX's defaults are calibrated for high accuracy at practical latency.

Try it live · VectaX Playground

Run encrypted operations and observe the accuracy versus latency trade-off

Section 04 · VectaX

VectaX encrypted inference: three layers working together

VectaX encrypted inference is not FHE applied to vectors in isolation. It combines three components: FHE for the encryption scheme, vector encryption for similarity-preserving storage, and RBAC for policy-controlled access. All three are necessary. FHE alone lets you compute on encrypted data but does not preserve the similarity relationships needed for vector search. Vector encryption alone protects stored embeddings but not the inference computation. RBAC alone controls access but does not prevent extraction attacks through repeated queries.

Python · Encrypted search with RBAC policy decryption (Qdrant integration docs)

from mirror_sdk.core.models import RBACVectorData, MirrorCrypto
from mirror_sdk.utils import decode_binary_data
from mirror_sdk.core import MirrorError

# 1. Encrypt query vector with the user's access policy
policy = {"roles": ["analyst"], "groups": ["team_finance"], "departments": ["finance"]}
query_data = RBACVectorData(vector=query_embedding, id="query", access_policy=policy)
encrypted_query = sdk.rbac.encrypt(query_data)

# 2. Search on encrypted vectors - vector DB never sees plaintext
results = qdrant.query_points(
    collection_name="vectax",
    query=encrypted_query.crypto.ciphertext,
    limit=10
)

# 3. Decrypt only results within the user's policy scope
accessible = []
for point in results.points:
    try:
        meta = decode_binary_data(point.payload["encrypted_vector_metadata"])
        mirror_data = MirrorCrypto.deserialize(meta)
        decrypted = sdk.rbac.decrypt(
            mirror_data,
            point.payload["encrypted_header"],
            user_key,
        )
        accessible.append({"id": point.id, "content": point.payload["content"]})
    except MirrorError:
        pass  # vector outside user's policy scope - access denied

Try it live · VectaX Playground

Test encrypted search with RBAC decryption and see which vectors each role can access

Section 05 · Agent Memory

Encrypted vector memory: what agents remember and who can read it

AI agents build up memory as they work. They store conversation history to maintain context across turns. They cache retrieved document embeddings to avoid redundant lookups. They write summaries, intermediate reasoning states, and task progress to memory stores. In most deployments, all of this is stored in plaintext or with only standard database encryption at rest.

Encrypted vector memory changes this. Every memory entry stored by the agent is encrypted at the moment of creation, before it is written to any storage system. The agent can later retrieve and reason over its memory because VectaX's Similarity-Preserving Search keeps the embeddings searchable. But the memory store itself, and any infrastructure between the agent and the store, never holds plaintext.

Standard agent memory (plaintext)
Conversation: "User asked about Q3 salary bands. Retrieved HR policy doc..."
System prompt: "You are an HR assistant with access to confidential pay scales..."
Retrieved docs: "[CONFIDENTIAL] Senior Engineer band: £95,000–£130,000..."
Visible to provider infrastructure, logs, and memory DB admins
VectaX encrypted agent memory
Conversation: ⊕ AX7F2Q3R9K... [FHE ciphertext]
System prompt: ⊕ BK3M7P1W4N... [FHE ciphertext]
Retrieved docs: ⊕ QR9X4T2M8V... [FHE ciphertext]
Provider infrastructure sees only ciphertext. No plaintext at any layer.

This matters most for long-running agents that accumulate sensitive information over many interactions. Without encrypted memory, the memory store becomes a consolidated plaintext record of everything sensitive the agent has ever retrieved or been told. With encrypted memory, even if the memory store is breached, the attacker sees encrypted blobs that cannot be reversed without the decryption key.

Section 06 · Context Security

Encrypted context windows and conversation history

The context window is the most sensitive surface in a RAG system at inference time. It contains the system prompt (which may include confidential instructions), the retrieved documents (the sensitive content you went to all this trouble to protect), the conversation history (which may contain personal information shared by the user), and the user's current query.

In a standard RAG deployment, all of this travels to the AI provider's inference server in plaintext. The provider can technically read every element of the context window, even if their policy says they do not.

VectaX encrypted context windows keep retrieved document embeddings and conversation history entries in encrypted form until the last possible moment. Decryption happens only at an authorised endpoint. This is particularly valuable when the system prompt contains proprietary instructions that the organisation does not want to expose to the model provider's infrastructure.

MCP and encrypted context: The Model Context Protocol (MCP) standardises how AI models like Claude connect to external data sources. VectaX's MCP server integration means that when Claude queries your vector store through MCP, the connection is secured with VectaX encryption and RBAC policies are enforced at query time. See Section 10 for the full MCP setup.

Section 07 · Model Security

Secure model deployment: protecting weights and controlling predictions

Encrypted inference protects the input data. But there is a second concern: protecting the model itself. Model weights represent significant intellectual property. An organisation that fine-tunes a foundation model on proprietary data has built something valuable. Exposing those weights to provider infrastructure, or to users through model extraction attacks, is a real risk.

VectaX addresses model security through two mechanisms. First, encrypted inference means model weights can be deployed to infrastructure that never processes plaintext inputs or outputs, reducing information available for extraction attacks. Second, RBAC on predictions means access to specific model endpoints can be controlled at the same role, group, and department level as data access.

🛡
Protected model weights
Model weights encrypted at rest and in transit. Fine-tuned weights never exposed to inference infrastructure in plaintext.
🔒
Secure inference pipeline
Inputs encrypted before reaching the inference server. Outputs encrypted before leaving. Provider sees only ciphertext at every stage.
👥
Access-controlled predictions
RBAC applied to model endpoints. An analyst role cannot call a prediction endpoint restricted to data scientists. Predictions scoped by user policy.

Model extraction attacks work by querying a model repeatedly with carefully chosen inputs to reconstruct its decision boundaries. Encrypted inference makes this harder because the attacker sees encrypted outputs that must be decrypted before they can be used to train a surrogate model. Combined with rate limiting and query pattern monitoring from Module 6, this significantly raises the cost of extraction attacks.

Section 08 · Multi-Party

Secure multiparty computation: AI across organisational boundaries

Sometimes the most valuable RAG systems are built on data from multiple organisations. A hospital network wants to build a diagnostic AI using patient records from 12 hospitals. A financial consortium wants to detect fraud patterns across member banks. In both cases, each organisation has data too sensitive to share with the others, but the combined dataset would produce a much better model.

Secure Multiparty Computation (SMPC) solves this. It allows multiple parties to jointly compute a function over their combined inputs without any party revealing their individual data to the others. Each party learns only the final result of the computation, not the inputs from other participants.

SMPC for federated AI: hospitals without sharing patient data

🏥
Hospital A
Local patient data (encrypted)
+
🏥
Hospital B
Local patient data (encrypted)
+
🏥
Hospital C
Local patient data (encrypted)
🧠
Shared SMPC result
Better model. No raw data shared.
Each hospital computes locally on their encrypted data. The SMPC protocol combines results mathematically so the final output is equivalent to training on all datasets combined, without any hospital ever seeing another hospital's patient records.

SMPC and FHE complement each other. FHE allows a single party to process encrypted data without decrypting it. SMPC allows multiple parties to combine their data without sharing it. Used together, as the Cisco 2024 white paper on securing vector databases notes, FHE can reduce the communication overhead typically associated with SMPC, because homomorphically encrypted data can be processed by a single party without constant back-and-forth between participants.

Section 09 · Tools

Open-source FHE libraries

For teams evaluating whether to build their own FHE implementation or use a solution like VectaX, understanding the open-source library landscape is useful. These libraries provide the cryptographic primitives. They do not provide AI-optimised implementation, noise control tuning, RBAC integration, or production deployment tooling.

SEAL
Microsoft Research
Schemes: BFV, CKKS
C++ with Python wrapper
The most widely adopted FHE library in industry and research. Excellent documentation, active maintenance, and GPU acceleration support. CKKS implementation is well-suited for approximate arithmetic on real numbers. Used as a building block in commercial FHE products.
HElib
IBM Research
Schemes: BGV, CKKS
C++
One of the oldest and most battle-tested FHE libraries. Strong performance on BGV, which is optimised for integer arithmetic. IBM has used HElib in production research for financial and healthcare applications. Steeper learning curve than SEAL but highly optimised for specific workloads.
Pyfhel
Open-source community
Schemes: BFV, BGV, CKKS (via SEAL/HElib)
Python
Python wrapper making FHE accessible without C++ expertise. Useful for prototyping and learning. Performance is lower than direct C++ usage. Not recommended for production AI workloads where latency matters. Good starting point before committing to an optimised implementation.

Build versus buy for AI FHE: Using a general-purpose FHE library for AI inference requires significant expertise in noise management, parameter selection, and performance tuning for neural network operations. VectaX packages this work with AI-specific optimisations, SDK-level noise control, RBAC integration, and production deployment tooling. The Mirror Security and SiSys AI collaboration on the MIRROR co-processor will further accelerate FHE operations in hardware for hyperscale deployments.

Section 10 · MCP

MCP integration: VectaX as a secure vector backend for Claude

The Model Context Protocol (MCP) is a standard that lets AI models like Claude connect to external data sources through a defined interface. It is sometimes described as the "USB-C port for AI": a universal connector that any MCP-compatible tool can plug into any MCP-compatible model without custom integration work.

VectaX provides an MCP server that acts as a secure vector database backend. When Claude Desktop connects to the VectaX MCP server, it gains access to semantic search over your vector store with all VectaX security controls enforced: similarity-preserving FHE encryption, RBAC at role, group, and department level, audit logging of all interactions, and compliance-ready monitoring.

Shell · VectaX MCP server installation (github.com/mirrorsecai/mirror-vectax-mcp-server)

# Clone the VectaX MCP server
git clone https://github.com/mirrorsecai/mirror-vectax-mcp-server

# macOS / Linux automated setup
chmod +x setup_claude_config.sh
./setup_claude_config.sh

# Windows automated setup
.\setup_claude_config.bat

# The script handles everything automatically:
# - Installs dependencies (pip install mirror-sdk)
# - Configures VectaX with your MIRROR_API_KEY
# - Registers the MCP server with Claude Desktop
# - Applies security best practices (TLS, key rotation, audit logging)

# After setup: open Claude Desktop
# Claude now queries your VectaX vector store via MCP
# with all RBAC policies enforced at query time

Try it live · VectaX Playground

Test VectaX encrypted search before setting up the full MCP integration

Section 11 · Compliance

Compliance implications: what encrypted inference does for regulated industries

Encrypted inference changes the compliance conversation for AI in regulated industries. The fundamental problem with using third-party AI services for sensitive workloads has always been that you are handing plaintext data to an external infrastructure provider. FHE removes this objection. The provider's infrastructure processes ciphertext, not your data.

GDPR
EU · Personal Data
Article 32 requires appropriate technical measures to protect personal data during processing. The EDPB has confirmed that encryption during processing satisfies this. Article 25 (data protection by design) requires building privacy into the system from the start.
FHE provides cryptographic proof that personal data was never processed in plaintext by the provider. Satisfies Articles 25 and 32 definitively, not on a contractual basis.
HIPAA
US · Health Information
The Security Rule requires encryption of PHI at rest and in transit. AI workloads processing patient records must ensure the AI provider cannot access PHI. Business Associate Agreements are required but insufficient on their own.
Encrypted inference means the AI provider never holds PHI in plaintext at any stage. This satisfies the technical safeguard requirements and significantly reduces HIPAA risk for AI healthcare applications.
PCI DSS
Global · Payment Data
Requirement 3 mandates encryption of stored cardholder data. Requirement 4 mandates encryption in transit. AI systems processing payment data for fraud detection must ensure cardholder data is not exposed to inference provider infrastructure.
Encrypted inference enables fraud detection and transaction AI on cardholder data without exposing it to the inference provider, satisfying Requirements 3 and 4 for AI processing pipelines.
EU AI Act
EU · High-Risk AI Systems
High-risk AI systems (HR, credit scoring, healthcare, biometrics) must implement risk management systems, maintain technical documentation, and ensure data governance measures that minimise data quality and security risks.
FHE provides a documented technical control for the required technical documentation. VectaX audit logs satisfy the traceability and monitoring requirements for high-risk AI systems.
FIPS 140-2 / 140-3
US Government · Cryptographic Modules
US federal systems must use FIPS-validated cryptographic modules. Government AI systems processing classified or sensitive data must use validated implementations of approved cryptographic algorithms.
VectaX uses FIPS-compliant cryptographic algorithms and is designed for government and regulated industry deployments, supporting the FIPS requirement for validated cryptographic modules in government AI systems.
SOC 2 Type II
Global · Service Organisations
Requires documented security controls for systems processing customer data, with third-party audit validation that controls operate effectively over time. Access controls, encryption, logging, and incident response must be documented and tested.
VectaX provides cryptographically-signed audit logs, RBAC with documented policy enforcement, and encryption at every stage. Mirror Security holds SOC 2 Type I certification with Type II in progress.

Cryptographic guarantees versus policy-based compliance: Most AI compliance approaches rely on contractual commitments: "we will not use your data to train models," "we will delete your data after 30 days." These depend on the provider acting in good faith and having no security breaches. FHE is a mathematical guarantee: the provider's infrastructure is cryptographically incapable of reading your data, regardless of policy or breach status. For regulated industries, the difference between a contractual assurance and a cryptographic proof is significant.

Next: Module 6 of 6

RAG Security in Production

Output monitoring, retrieval audit logging, red teaming RAG systems with DiscoveR, SBOM maintenance, incident response for RAG breaches, and the complete production security checklist.