Vector Database Architecture & the RAG Pipeline

Section 01 · Foundations

What is a vector database and why do AI systems need one

A traditional database looks for exact matches. You search for a customer ID, a product name, or a date range, and it returns rows where the values match. That works for structured business data. It does not work when you want to ask an AI system something like "find me all the documents that are similar in meaning to this paragraph."

Vector databases solve this. Instead of storing data as rows and columns, they store data as long lists of numbers called vectors. Each vector represents the meaning or properties of a piece of content. When you search, the database does not look for an exact match. It calculates which stored vectors are mathematically closest to your query vector. The closest matches are the most semantically similar.

This is called similarity search, and it is the foundation of how RAG systems retrieve context before generating a response.

1,536

Dimensions in OpenAI's text-embedding-3-small

3,072

Dimensions in text-embedding-3-large

4,096

Dimensions in NV-Embed-v1 (Mistral-based)

Common vector databases used in production RAG systems include Pinecone, ChromaDB, pgvector (a PostgreSQL extension), MongoDB Vector Search, and FAISS (Facebook AI Similarity Search). ChromaDB and FAISS are popular during development because they are easy to set up. Neither was built with enterprise access control in mind, which creates security gaps when teams ship them to production without adding controls on top.

Why this matters for security: Vector databases are not like traditional databases where access control is well-understood. The similarity search mechanism itself can be an attack surface. An attacker with query access can, through repeated searches, reconstruct approximately what content is stored in the database, even if they never see the raw documents. This is covered in detail in Module 2.

Section 02 · Core Concept

How embeddings work

An embedding is a numerical representation of content. You feed text, an image, or a document into an embedding model, and it outputs a long array of decimal numbers. That array is the embedding. Documents with similar meaning produce arrays that are mathematically close together. Documents with very different content produce arrays that are far apart.

The numbers themselves do not have human-readable meaning. You cannot look at position 47 in an embedding array and know what it represents. The meaning is in the relationships between all the numbers together.

Text to vector: how embedding models encode meaning

"security"

→

0.0279 -0.0034 0.0465 0.0060 0.0455 0.0009 -0.0094 0.0182 -0.0241 0.0391 0.0012 -0.0067 ... 1,524 more

"encryption"

→

0.0281 -0.0031 0.0461 0.0058 0.0448 0.0011 -0.0091 0.0179 -0.0238 0.0388 0.0014 -0.0065 ... 1,524 more

"football"

→

-0.0412 0.0891 -0.0177 0.0923 -0.0334 0.0712 0.0445 -0.0622 0.0811 -0.0193 0.0567 -0.0389 ... 1,524 more

Highlighted values show where "security" and "encryption" embeddings are numerically close. "Football" is far from both.

The similarity between vectors is typically measured using cosine similarity, which measures the angle between two vectors rather than the distance between them. A cosine similarity of 1.0 means identical. A score of 0.0 means completely unrelated. Most similarity searches retrieve the top-K vectors above a similarity threshold.

Embedding model types

Technique	Type	Captures context?	Common use
TF-IDF	Syntactic	No	Keyword search, document ranking
Word2Vec	Semantic	Partial	Word-level similarity, early NLP
GloVe	Semantic	Partial	Word similarity, analogy tasks
BERT	Semantic	Yes, bidirectional	Sentence embeddings, classification
OpenAI text-embedding-3	Semantic	Yes, deep context	RAG retrieval, semantic search
NV-Embed-v1	Semantic	Yes, state of the art	Retrieval, reranking, clustering

Security implication of embedding model choice: If you use a publicly known embedding model (like OpenAI's models), an attacker who has query access to your vector database and knows which model you used can attempt embedding inversion attacks more efficiently. The attacker knows the vector space structure. This is covered in Module 2 under attack surface analysis.

Section 03 · Technical Context

The curse of dimensionality

Vector embeddings typically have hundreds or thousands of dimensions. This causes a mathematical problem that affects both performance and security: the curse of dimensionality.

As dimensions increase, the volume of the vector space grows exponentially. Data points spread out so much that they all appear approximately equidistant from each other. Distance measures that work well in low-dimensional spaces become unreliable in high-dimensional ones. The "nearest neighbor" to your query might only be marginally closer than the 1,000th nearest neighbor.

How distance measures degrade with more dimensions

2D space

Reliable

100 dimensions

Degrades

512 dimensions

Unreliable

1,536 dimensions

Very poor

As dimensions increase, all vectors become nearly equidistant. Standard distance measures lose their ability to distinguish truly similar items from dissimilar ones. Vector databases use approximate nearest neighbor (ANN) algorithms to manage this, but the trade-off is that results are approximate, not exact.

Dimensionality reduction techniques like Principal Component Analysis (PCA) compress high-dimensional vectors into lower-dimensional representations while preserving as much structure as possible. This improves search efficiency, but also changes the security properties of the embeddings.

The practical security implication is this: because distance measures are less reliable at high dimensions, it is harder to reconstruct the original content from a stolen embedding vector. But it also means that two different pieces of content can have similar embeddings, making it possible to craft a document that retrieves alongside a target document through a poisoned injection.

Practitioner note: Dimensionality reduction and quantization (compressing vectors to smaller number types) are common in production to reduce storage and search costs. Both affect security. Compressed or quantized embeddings are harder to invert, which is a security benefit. But they also reduce retrieval accuracy, which can mask poisoned content that sits just above the similarity threshold. Know which trade-off you are accepting.

Section 04 · Architecture

Index structures: how vectors are organised and why it matters for security

At the core of every vector database is an index. The index determines how vectors are stored, how similarity search is performed, and how fast results are returned. Most vector databases use approximate nearest neighbor (ANN) algorithms rather than exact search because exact search across millions of high-dimensional vectors is computationally impractical.

The choice of index affects more than performance. Poorly configured indexes can leak information through similarity patterns and expose systems to query-based attacks. Security teams auditing vector databases need to understand what index type is in use and what its specific failure modes are.

HNSW

Hierarchical Navigable Small World

Builds a multi-layer graph where each layer connects similar vectors. Searches start at the top layer (coarse) and narrow down to the bottom layer (fine). Delivers excellent speed and accuracy. Used by default in Pinecone, Qdrant, and Weaviate.

Security consideration: The graph structure stores relationships between vectors. An attacker with repeated query access can map the graph and infer what content clusters exist, even without seeing the raw embeddings.

IVF

Inverted File Index

Clusters vectors into groups using k-means. Searches only the clusters closest to the query rather than the full dataset. Good for very large datasets where search speed matters more than perfect recall. Used in FAISS.

Security consideration: Cluster boundaries can be probed. If an attacker can determine which cluster a query falls into, they learn something about the distribution of your stored data.

LSH

Locality Sensitive Hashing

Hashes vectors so that similar vectors are likely to land in the same hash bucket. Fast and scalable but lower accuracy than HNSW. Useful for web-scale applications where approximate results are acceptable.

Security consideration: Hash collisions can cause unrelated documents to appear in the same retrieval result, potentially surfacing content the user should not see.

Flat Index

Exact Search (Brute Force)

Compares every query vector against every stored vector. Perfectly accurate but slow at scale. Only practical for small datasets or when accuracy is critical and the dataset is under a few hundred thousand vectors.

Security consideration: Exact search confirms with certainty what is or is not in the database. In a threat model where an attacker has query access, this is a more powerful information leakage tool than approximate search.

Index security principle: Any index that enables efficient similarity search also enables efficient adversarial probing. The same properties that make a good index for retrieval make it a good tool for an attacker who wants to map the contents of your database. Encryption at the vector level (not just storage encryption) is the only way to close this gap. VectaX's Similarity-Preserving Search keeps the index usable while making the vectors themselves cryptographically opaque.

Section 05 · Architecture

Storage layers: where your data actually lives

Vector databases store more than just embeddings. They hold metadata, document references, and identifiers that connect vectors back to source content. These elements are spread across different storage layers, each with its own performance profile and security requirements.

Most security reviews focus on the vector store as a single entity. The reality is that sensitive data is distributed across at least three distinct layers, each of which can be a security gap if left unaddressed.

In-Memory Storage

Holds the active index and frequently accessed vectors in RAM for fast retrieval. All data in memory is unencrypted by default. If a process has memory access (through a misconfiguration or memory inspection attack), it sees raw vectors.

Risk: cross-tenant leaks in shared environments, process memory exposure

Persistent Disk Storage

Stores the serialised index and embedding data on disk or cloud object storage for durability. Encryption at rest (typically AES-256) protects this layer if the storage medium is stolen or accessed without authorisation. Many vector databases do not enable this by default.

Risk: unencrypted disk files, cloud storage misconfiguration, backup exposure

Metadata Storage

Stores document identifiers, source file paths, user references, classification labels, timestamps, and other contextual information alongside embeddings. This is the most sensitive layer because metadata explicitly maps embeddings back to source documents and users. Leaking metadata tells an attacker exactly what is in your database, even without the embedding values.

Risk: highest sensitivity, often stored in plaintext, links embeddings to real documents and users

VectaX format-preserving encryption (FPE) specifically addresses the metadata layer. FPE encrypts sensitive metadata fields like document IDs, customer identifiers, and classification labels while preserving their format and searchability. This means a compromised storage layer does not expose the mapping between embeddings and source content.

Section 06 · Architecture

How a RAG pipeline works

RAG stands for Retrieval-Augmented Generation. The idea is straightforward: instead of relying only on what the model learned during training, you retrieve relevant documents at query time and inject them into the prompt as context. The model generates its response based on both its training and the retrieved content.

This is why RAG systems can answer questions about company policies, private documents, real-time data, and anything else that was not in the model's training data. The retrieval step brings in the relevant context. The generation step uses that context to produce the answer.

The RAG pipeline: document to response

Source Documents

PDFs, wikis, databases, APIs

Ingest control

Chunking

Split into retrievable segments

Low risk

Embed

Convert to vectors via embedding model

Model supply chain

Vector Store

Store embeddings in vector DB

High risk

Retrieve

Similarity search on query vector

Injection surface

Generate

LLM responds using retrieved context

Output exposure

Every step in this pipeline is a potential attack surface. Source document ingestion can be abused to introduce poisoned content. The embedding model can be a supply chain risk if it comes from an untrusted source. The vector store holds sensitive data in plaintext in most default configurations. The retrieval step is the primary injection surface: retrieved content goes directly into the LLM's context window. And the generation step can leak sensitive retrieved content through the response.

The security mental model for RAG is not "protect the model." It is "protect every step of the pipeline." This is what the remaining modules in this path cover.

Section 07 · Stack Components

The full RAG technology stack

A production RAG system is not just documents and a vector database. It has eight distinct layers, each with its own security properties and failure modes. Security teams auditing RAG systems need to cover all eight. Most assessments stop at the vector database and miss the rest.

Data Pipelines

Ingest raw data from PDFs, databases, APIs, and data lakes into the system.

Risk: poisoned source documents, unvalidated inputs

Embedding Models

Convert documents and queries into vector representations.

Examples: OpenAI, Cohere, Hugging Face. Risk: supply chain, model substitution

Vector Database

Stores embeddings and serves similarity search queries.

Examples: Pinecone, ChromaDB, pgvector, FAISS. Risk: plaintext storage, no RBAC

Orchestration

Manages workflow between data, retrieval, and LLM.

Examples: LangChain, LlamaIndex. Risk: vulnerable dependencies, untrusted plugins

APIs and Plug-ins

External integrations that extend what the RAG system can do.

Risk: third-party data exfiltration, unvalidated external calls

Apps and Front Ends

User interfaces where queries are entered and responses displayed.

Risk: prompt injection via user inputs, response data exposure

LLM Cache ⚠

Stores recent LLM outputs for faster retrieval on repeated queries.

Examples: Redis, GPTCache. Risk: sensitive retrieved content persists in cache. See section below.

Frequently missed in security audits

Policy and Guardrails

Controls what content the system can retrieve, process, and return.

Mirror Security AgentIQ handles runtime guardrails for AI agents

Section 08 · Overlooked Attack Surface

The LLM cache: a security surface most teams skip

LLM caches store the outputs of recent queries so that identical or near-identical queries can be answered instantly without running a full retrieval and generation cycle. Tools like Redis and GPTCache are commonly used. In high-traffic systems, this can significantly reduce latency and API costs.

The security problem is straightforward: the cache stores LLM outputs that often contain sensitive retrieved content. If user A asks "What is our salary band for senior engineers?" and that query hits the vector store and retrieves a sensitive HR document, the response (including the retrieved content) may be stored in cache. If user B sends a sufficiently similar query, they may receive the cached response, which contains sensitive information they do not have authorisation to see.

⚠

Cache bypass: a real access control gap

Most RAG RBAC implementations control what a user can retrieve from the vector database. Very few extend those controls to the LLM cache layer. An attacker with lower-privileged access can craft queries designed to hit cached responses from higher-privileged users. The vector database access control is bypassed entirely because the system never consults it again. The cached response is served directly.

Three things every RAG security review should check on the LLM cache:

1. Cache key design. If the cache key is based purely on the query string, slightly different phrasings of the same question produce different cache keys. This means sensitive responses get cached multiple times under different keys. It also means an attacker can iterate through variations to find a cached sensitive response.

2. Cache entry scope. Is the cache global (shared across all users) or per-user? A global cache in a multi-tenant system leaks responses across tenant boundaries. Per-user caches are safer but more expensive to maintain.

3. Cache entry lifetime. How long do cached entries persist? A cached response containing sensitive data that was valid at time T may no longer be valid at time T+30 days if the underlying document has been updated or the user's permissions have changed.

Section 09 · Controls

Access control: who can query, insert, and retrieve

Vector databases influence what information AI models see and how they behave. Weak access control means any component with query access can retrieve any document, regardless of classification or ownership. In a multi-tenant environment this is a data leakage path. In an agentic environment it is a privilege escalation path.

There are two main access control models relevant to vector databases, and they work best in combination.

RBAC

Role-Based Access Control

Assigns permissions based on predefined roles. Simple to manage at scale. Works well for stable, well-defined access patterns. In vector databases, RBAC typically controls who can insert embeddings, run similarity queries, manage indexes, or access metadata.

Example: A Data Scientist role can insert and query. An Analyst role can only query. An Admin role can manage indexes and delete data. VectaX enforces this at role, group, and department level simultaneously.

ABAC

Attribute-Based Access Control

Evaluates access based on dynamic attributes: user identity, service type, request source, time of day, data classification, environment. More flexible than RBAC and better suited to AI systems where agents and automated pipelines operate dynamically.

Example: Allow query access only if the request originates from the internal network AND the requesting service is on the approved list AND the query targets a specific namespace. ABAC adds precision but requires careful policy design to avoid unexpected access paths.

API Key Hygiene

API keys are the most common authentication mechanism for vector databases and they are also one of the most common causes of unauthorized access. The problem is not usually that teams use API keys, it is how they use them.

Risk

Hardcoded keys in source code. A key committed to a repository (even a private one) is a key waiting to be leaked. Secrets scanners catch most of these but not all. Rotate immediately if you find one.

Risk

Long-lived keys with no expiry. A key that never expires stays valid indefinitely after a breach. Use short-lived tokens where the vector database supports it, or enforce rotation schedules.

Risk

Over-scoped keys. A key that grants admin access when only read access is needed expands the blast radius of a compromise. Apply least-privilege: if a service only queries, its key should not be able to insert, delete, or modify indexes.

Risk

Shared keys across services. If multiple services share one key, revoking that key after a breach affects all of them. Issue separate keys per service so you can revoke selectively.

Fix

Monitor key usage patterns. Sudden spikes in query volume from a single key, queries at unusual hours, or queries that scan many different namespaces are all signals of abuse. Set alerts.

Least privilege in practice: An application that only retrieves context for a chatbot should not have permission to insert embeddings, delete documents, or manage indexes. An agent that writes summaries back to a store should not be able to read documents outside its assigned namespace. Grant only what the task actually requires and nothing more.

Section 10 · Controls

Encryption layers: at rest, in transit, and in use

Encryption for vector databases operates at three distinct layers. Most organisations implement the first two. Very few implement the third, which is why data breaches at the vector layer are still possible even in systems that appear well-secured.

1

Encryption in Transit

Protects data moving between your application, the embedding model API, and the vector database. Implemented via TLS/SSL. Without this, an attacker on the same network can intercept queries, embeddings, or retrieved context in plaintext.

Standard. Most managed vector databases enforce TLS. Self-hosted deployments must configure it explicitly. Do not skip certificate verification.

2

Encryption at Rest

Protects embedding data stored on disk or in cloud storage. Typically AES-256. If the physical storage is stolen or the cloud bucket is misconfigured, encryption at rest prevents direct extraction of the data.

Common but not universal. Many self-hosted vector databases (ChromaDB, FAISS) do not encrypt at rest by default. You must enable it at the storage or OS layer. This does not protect against a compromised application with valid credentials.

3

Encryption in Use

Protects embeddings while they are being processed, including during similarity search. This is what Fully Homomorphic Encryption (FHE) enables. The vector database can perform similarity calculations on encrypted vectors without ever decrypting them. An attacker who fully compromises the database server still cannot read the embedding values.

Rare. VectaX from Mirror Security implements this using similarity-preserving FHE, the only approach that keeps vectors encrypted through the entire compute path, including the index and the similarity search itself.

Key Management

Encryption is only as strong as the key management around it. If encryption keys are stored alongside the data they protect, an attacker who accesses the storage layer gets both the data and the keys. Effective key management separates key storage from data storage and enforces rotation, auditing, and access controls on keys independently.

In practice this means using a dedicated key management service: AWS KMS, Google Cloud KMS, Azure Key Vault, or any provider that supports the KMIP standard. Keys should be rotated on a schedule, and key usage should be logged so you can detect if a key is used in unexpected ways.

Section 11 · Architecture

Deployment patterns and their security trade-offs

Where you deploy a vector database changes your attack surface. The same vector database product can have very different security properties depending on whether it runs as a managed cloud service, inside a private cloud, or on-premises. This is not a minor operational detail. It determines who is responsible for each security control and where the gaps are most likely to appear.

☁

Managed Cloud

Provider handles infrastructure, patching, scaling, and baseline security. Convenient and fast to deploy. Examples: Pinecone, Weaviate Cloud, Zilliz Cloud.

Lower ops burden, trust shared with provider

🏗

Private Cloud

Deployed in your own cloud VPC or private environment. More control over network isolation, encryption configuration, and access policies. Examples: Self-hosted Qdrant, Weaviate, or pgvector on AWS/GCP/Azure.

More control, more responsibility for configuration

🖥

On-Premises

Full control over hardware, network, and software stack. Maximum data sovereignty. Required for highly regulated environments like defence, healthcare with strict data residency, or financial services with in-country requirements.

Maximum control, full security responsibility

Three security questions that every deployment decision must answer:

Network exposure. Is the vector database endpoint publicly reachable, or only accessible from within a private network? A public endpoint without proper authentication exposes the entire retrieval layer to the internet. This is one of the most common misconfiguration categories in vector database deployments.

Tenant isolation. In a shared deployment (multiple teams or customers using the same vector database instance), are namespace boundaries enforced at the database level or only at the application level? Application-level isolation fails if a bug or injection attack bypasses the application logic.

Monitoring coverage. Can you detect misuse of the deployment? A vector database with no query logging or anomaly detection can be exfiltrated over weeks through low-volume systematic queries without triggering any alerts.

Most common deployment mistake: Teams deploy a development-grade configuration (ChromaDB or FAISS, no authentication, no encryption, no logging) into a production environment because it worked in testing. The development configuration is designed for single-user local use. It has no access controls, stores everything in plaintext, and logs nothing. Do not ship it to production without adding every security layer that the development defaults skip.

Section 12 · The Core Security Gap

The plaintext problem: why traditional RAG is not secure by default

Every component of a standard RAG pipeline operates on data in plaintext. Documents are ingested in plaintext. Embeddings are generated from plaintext and stored as plaintext vectors. Queries are sent to the embedding model in plaintext. Retrieved documents are injected into the LLM prompt in plaintext. The LLM response contains plaintext summaries of retrieved content.

This means that any component in the pipeline that is compromised exposes the underlying data. A misconfigured vector database leaks every embedding in it. A compromised orchestration layer sees every query and every retrieved document. A cache with weak access control exposes historical responses.

Traditional RAG (Default)

✕ Embeddings stored in plaintext
✕ No access control over retrieval
✕ Sensitive data exposed during processing
✕ Limited audit trails on retrieval
✕ Compliance risks under GDPR, HIPAA, AI Act

Secure RAG with VectaX

✓ Embeddings encrypted, still searchable
✓ RBAC at role, group, and department level
✓ Format-preserving encryption on metadata
✓ Full audit logging on retrieval events
✓ Compliance-ready architecture

VectaX from Mirror Security addresses the plaintext problem using Similarity-Preserving Search. This is the key property that makes encrypted vector search possible: the encrypted embeddings maintain enough mathematical structure that similarity comparisons still work, even though the actual values are encrypted. You can run a similarity search against encrypted vectors without ever decrypting them.

The access control gap is addressed through VectaX's built-in RBAC, which enforces permissions at role, group, and department levels at query time, not just at ingestion time. An analyst in the finance department retrieving from a shared vector store cannot retrieve documents that belong to the legal department's namespace, even if they know exactly how to phrase the query.

Module 5 covers the cryptographic foundations of this in detail: how Fully Homomorphic Encryption (FHE) enables encrypted inference, what format-preserving encryption (FPE) does to metadata, and how these combine into a complete secure RAG architecture.

Vector DatabaseArchitecture & theRAG Pipeline

What is a vector database and why do AI systems need one

How embeddings work

The curse of dimensionality

Index structures: how vectors are organised and why it matters for security

Storage layers: where your data actually lives

How a RAG pipeline works

The full RAG technology stack

The LLM cache: a security surface most teams skip

Access control: who can query, insert, and retrieve

Encryption layers: at rest, in transit, and in use

Deployment patterns and their security trade-offs

The plaintext problem: why traditional RAG is not secure by default

Encrypt your RAG pipeline without rebuilding it

Vector Database
Architecture & the
RAG Pipeline