Secure RAG for Trading and Fraud Detection with VectaX

Q: What financial regulations apply to AI systems handling customer data?

In the United States, GLBA (Gramm-Leach-Bliley Act) requires financial institutions to protect customer financial information and implement safeguards. The FTC Safeguards Rule specifies technical controls including encryption. In the EU, MiFID II requires record-keeping and audit trails for trading activity. DORA (Digital Operational Resilience Act) adds ICT risk management requirements including for AI systems. GDPR applies to any EU customer data.

The starting point

Why financial AI has a harder problem than most sectors

Most sectors worry about a single sensitive data type. Healthcare worries about PHI. Financial services worries about everything at once. A trading desk AI assistant handles material non-public information, customer account balances, transaction histories, fraud signatures, analyst recommendations, and regulatory filings. Some of these must be kept separate from each other by law, not just by policy.

The insider trading risk alone changes the threat model. If an M&A analyst at a bank can query a RAG system and receive information about a pending deal that their counterpart on the equities desk is already trading, that is a securities violation regardless of whether it was intentional. The AI system becomes the information leak. Standard query-time access control is not enough when a misconfigured retrieval step or a prompt injection attack can bypass it.

There is also a fraud side. Transaction fraud detection often works by comparing new transactions against historical patterns. If those patterns are stored in plaintext, a breach of the fraud detection service exposes every flagged customer's transaction history. VectaX lets the comparison happen on ciphertext: fraud pattern matching that never decrypts the underlying records.

The specific threat this module addresses

A financial RAG system that decrypts customer data at query time creates two risks: a breach of the retrieval service exposes the full data set, and a prompt injection attack that bypasses access control gives the attacker plaintext records. VectaX encrypted retrieval means a breach yields only ciphertext, and a bypassed access control check still cannot produce readable data without the correct key.

Risk model

Financial data risk taxonomy for AI systems

Financial AI pipelines contain several categories of sensitive data with very different risk profiles. Understanding which category each data type falls into determines how it needs to be protected.

📌

Material non-public information

Pending M&A deals, undisclosed earnings data, regulatory actions. Exposure to the wrong internal party is an insider trading violation. Separation is required by law. VectaX RBAC enforces this cryptographically. Risk: critical.

📅

Trade history and positions

Client portfolios, position sizes, entry prices, and strategy details. A competitor with this data can front-run. A regulator with it will ask how it leaked. The RAG system that embeds trade notes is the most likely leak vector. Risk: high.

📋

Customer account records

Account numbers, balances, KYC records, and correspondence. GLBA and GDPR require confidentiality. Format-preserving encryption protects account numbers while keeping them usable for downstream lookups. Risk: high.

📊

Fraud pattern signatures

Known fraud patterns and flagged transaction sequences. If these are stored in plaintext, an adversary who accesses the fraud detection system learns exactly which patterns to avoid. Encrypted similarity search protects the signatures. Risk: medium-high.

Information barriers

Chinese walls with VectaX RBAC

A Chinese wall in financial services is an information barrier between departments that prevents conflicts of interest. The classic example is the wall between investment banking (which has MNPI about deals) and sales and trading (which would benefit from it). In a financial AI system, this wall needs to be enforced at the retrieval layer, not just the application layer.

VectaX RBAC generates keys scoped to roles, groups, and departments. A key for group=equities cannot decrypt records tagged for group=investment_banking. The separation is cryptographic. Even if a user constructs a query that should be blocked, the retrieved records cannot be decrypted without the matching key.

Equities desk

Role: equity_analyst | Group: equities

Cannot access: M&A deal data, fixed income positions, client credit files

Investment banking

Role: ib_associate | Group: investment_banking

Cannot access: trading positions, client brokerage accounts, market research not yet published

Compliance

Role: compliance_officer | Group: compliance

Scoped read access to both sides for surveillance only. Cannot initiate trades.

Why application-layer walls are not enough

An application-layer Chinese wall checks permissions before returning data. A prompt injection attack, a misconfigured query, or a compromised service account can bypass that check and retrieve plaintext data from the wrong department. VectaX RBAC means the check is not the only protection. Even if the application layer is compromised, the data cannot be read without the right key.

Getting started

Setup and initialisation

bashinstall

pip install mirror-sdk mirror_enc
pip install mirror-sdk[examples]  # for OpenAI and ChromaDB examples

bash.env

MIRROR_API_KEY=your-api-key
MIRROR_SERVER_URL=https://mirrorapi.azure-api.net/v1
MIRROR_TELEMETRY_ENABLED=true
MIRROR_POLICY_EVAL_ENABLED=true

pythoninit.py

from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig

config = MirrorConfig.from_env()
sdk = MirrorSDK(config)
print("Mirror SDK ready for financial services pipeline")

Step 1

Encrypting trade history embeddings

Trade notes and analyst commentary are the most valuable data in a trading desk RAG system and the most dangerous if exposed. This code embeds a trade note and encrypts the vector before it reaches the database.

pythonembed_trade_history.py

import openai
from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
from mirror_sdk.core.models import VectorData

config = MirrorConfig.from_env()
sdk = MirrorSDK(config)

# A representative analyst trade note (reference ID only, no customer name in the vector ID)
trade_note = """
Trade: Long 50,000 shares NVDA @ 142.30. Entry thesis: data center buildout
accelerating, supply constraints easing Q3. Stop: 138.00. Target: 162.00.
Risk/reward 3.9:1. Position size 2.1% of book. Correlated with AMD long.
"""

# Step 1: Embed the trade note
response = openai.embeddings.create(
    model="text-embedding-3-small",
    input=trade_note
)
embedding = response.data[0].embedding

# Step 2: Encrypt before storage
# Vector ID is a non-identifying reference, not the trader's name
vector = VectorData(vector=embedding, id="trade_eq_2026_0410_001")
encrypted_vector = sdk.vectax.encrypt(vector)

# Step 3: Set access policy scoped to the equities group
# Only equity_analyst role in equities group can decrypt
equities_policy = {
    "roles": ["equity_analyst", "portfolio_manager"],
    "groups": ["equities"],
    "departments": ["trading"]
}
sdk.set_policy(equities_policy)

print("Trade note encrypted and scoped to equities group")
print("Investment banking group cannot decrypt this vector")

Each department's embeddings are encrypted under a separate policy. The vector database holds only ciphertext. A compliance officer with a key scoped to group=compliance can retrieve across groups for surveillance purposes, but only with the key that was generated for that role.

Step 2

Format-preserving encryption for account numbers

Customer account numbers must retain their format for downstream processing. A 10-digit account number needs to still look like a 10-digit account number after encryption, or the rest of your systems break. VectaX format-preserving encryption handles this.

pythonaccount_fpe.py

from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig

config = MirrorConfig.from_env()
sdk = MirrorSDK(config)

# Customer record metadata - contains regulated identifiers
customer_record = {
    "source": "retail_brokerage",
    "account_number": "4920183756",
    "ssn_last4": "6842",
    "kyc_tier": "standard",
    "open_date": "2019-03-14"
}

# Generate FPE key and tweak, then encrypt
fpe_key = sdk.metadata.generate_key()
fpe_tweak = sdk.metadata.generate_tweak_from_data(customer_record)
encrypted_record = sdk.metadata.encrypt(customer_record, fpe_key, fpe_tweak)

print(f"Original account:   {customer_record['account_number']}")
print(f"Encrypted account:  {encrypted_record['account_number']}")
print(f"Format preserved:   still 10 digits, usable by downstream systems")

# Store fpe_key in your key vault (Azure Key Vault, AWS KMS, etc.)
# Never store it alongside the data

GLBA requires documented key management

The FTC Safeguards Rule under GLBA requires financial institutions to document their encryption key management procedures. The FPE key generated here must be stored in a managed key vault, not in application code or environment variables. Key rotation schedules must be documented and followed. A key management policy that exists only in someone's head is not GLBA-compliant.

Step 3

Encrypted fraud pattern matching

Fraud detection via RAG works by embedding a new transaction and finding the nearest matches in a store of known fraud patterns. With VectaX, this comparison happens on encrypted vectors. The fraud pattern library never decrypts during the matching process.

pythonfraud_matching.py

import openai
from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
from mirror_sdk.core.models import VectorData

config = MirrorConfig.from_env()
sdk = MirrorSDK(config)

# Step 1: Store a known fraud pattern (done at pattern library build time)
fraud_pattern = """
Pattern: rapid small-value transactions under reporting threshold across
multiple accounts within 24 hours, followed by a single large consolidating
transfer. Accounts opened within 60 days. No prior transaction history.
"""

pattern_embedding = openai.embeddings.create(
    model="text-embedding-3-small",
    input=fraud_pattern
).data[0].embedding

# Encrypt and store the fraud pattern
# Scoped to fraud_analyst and compliance roles only
encrypted_pattern = sdk.vectax.encrypt(
    VectorData(vector=pattern_embedding, id="fraud_pattern_smurfing_v2")
)

# Step 2: At detection time, embed and encrypt the incoming transaction
incoming_tx = """
Customer account opened 22 days ago. 14 deposits ranging $890-$970
across 6 branch locations over 48 hours. Single outbound wire $12,400.
"""

tx_embedding = openai.embeddings.create(
    model="text-embedding-3-small",
    input=incoming_tx
).data[0].embedding

encrypted_tx = sdk.vectax.encrypt(
    VectorData(vector=tx_embedding, id="tx_check_live")
)

# Step 3: Similarity search runs on ciphertext
# The encrypted_tx is compared against encrypted fraud patterns
# The matching score is computed without decrypting either vector
# (pattern: encrypted_results = fraud_vector_db.query_encrypted(encrypted_tx, n_results=5))

print("Transaction embedded and encrypted for pattern matching")
print("Fraud pattern library stays encrypted throughout")
print("Similarity ranking identical to plaintext comparison")

The key property here is that Similarity-Preserving Search guarantees no accuracy loss. A transaction that scores 0.94 cosine similarity against a fraud pattern in plaintext scores 0.94 against the same pattern when both are encrypted. Fraud detection quality is not traded for security.

Step 4

Validating AI financial advice with AgentIQ

Financial AI systems face a specific risk: hallucination. A model that confidently cites a stock price, a regulatory requirement, or a historical return that does not appear in the retrieved context is generating dangerous misinformation. In a regulated context, an AI-generated investment recommendation based on fabricated data is a compliance liability.

AgentIQ hallucination detection compares the model's response against the retrieved context and flags anything the model asserts that is not grounded in source material.

pythonadvice_validation.py

from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
from mirror_sdk.core.mirror_api_models import Action
import logging

logger = logging.getLogger("financial_ai_guard")
config = MirrorConfig.from_env()
sdk = MirrorSDK(config)

def validate_financial_response(
    analyst_query: str,
    retrieved_context: str,
    model_response: str
) -> dict:
    """
    Validate an AI-generated financial response.
    Checks for: hallucination, PII leakage, bias.
    Returns a validation report before the response reaches the user.
    """

    # 1. Check for hallucination against the retrieved context
    hallucination_result = sdk.agentiq.analyze_hallucination(
        input=analyst_query,
        output=model_response,
        context=retrieved_context,
        threshold=0.75  # stricter threshold for financial advice
    )

    is_hallucinated = False
    if hallucination_result.pairs:
        is_hallucinated = any(
            str(p.is_hallucination).lower() == 'true'
            for p in hallucination_result.pairs
        )

    # 2. Scan for PII leakage in the response
    # Customer data from retrieved context must not leak to other users
    pii_result = sdk.agentiq.detect_pii(
        text=model_response,
        pii_entities=["NAME", "ACCOUNT_NUMBER", "SSN", "EMAIL", "PHONE"],
        action=Action.REDACT
    )

    # 3. Log and return validation outcome
    if is_hallucinated:
        logger.warning("Hallucination detected in financial AI response. Blocking.")

    if pii_result.entities:
        logger.warning(
            f"PII detected in response: {[e.label for e in pii_result.entities]}"
        )

    return {
        "approved": not is_hallucinated and len(pii_result.entities) == 0,
        "hallucination_detected": is_hallucinated,
        "pii_detected": len(pii_result.entities) > 0,
        "safe_response": pii_result.redacted_text,
        "risk_score": pii_result.risk_score
    }

Hallucination threshold in financial contexts

The default hallucination threshold in AgentIQ is 0.5. For financial advice, use 0.75 or higher. A financial AI that says "the fund returned 12.3% in 2024" when the retrieved context says "the fund returned 11.8% in 2024" is not a minor error. In a regulated context it can be material misrepresentation. Set the threshold conservatively.

Step 5

Full trading desk RAG pipeline

This assembles the complete pipeline: encrypted retrieval scoped to the requesting analyst's department, decryption at context assembly, validated AI response.

pythontrading_rag_pipeline.py

import openai
from mirror_sdk.core.mirror_core import MirrorSDK, MirrorConfig
from mirror_sdk.core.models import VectorData
from mirror_sdk.core.mirror_api_models import Action

config = MirrorConfig.from_env()
sdk = MirrorSDK(config)

def trading_rag_query(
    analyst_query: str,
    analyst_role: str,
    analyst_group: str,
    analyst_department: str,
    vector_store,
    n_results: int = 4
) -> dict:
    """
    Run a trading desk RAG query with encrypted retrieval.
    The analyst's group scope determines which records are decryptable.
    Responses are validated for hallucination and PII before return.
    """

    # 1. Generate a role-scoped key for this analyst
    analyst_key = sdk.rbac.generate_user_secret_key({
        "roles": [analyst_role],
        "groups": [analyst_group],
        "departments": [analyst_department]
    })

    # 2. Embed and encrypt the query
    query_emb = openai.embeddings.create(
        model="text-embedding-3-small",
        input=analyst_query
    ).data[0].embedding

    encrypted_q = sdk.vectax.encrypt(
        VectorData(vector=query_emb, id="query")
    )

    # 3. Search encrypted vector store
    # Records outside the analyst's key scope cannot be decrypted
    encrypted_results = vector_store.query_encrypted(
        encrypted_q, n_results=n_results
    )

    # 4. Decrypt results using the scoped key
    contexts = []
    for enc_result in encrypted_results:
        try:
            decrypted = sdk.vectax.decrypt(enc_result)
            contexts.append(decrypted.metadata.get("summary", ""))
        except Exception:
            # Record from a different group: cannot decrypt, skip silently
            pass

    context_block = "\n\n".join(contexts)

    # 5. Generate AI response
    prompt = f"""You are a trading desk research assistant.
Answer using only the trade notes and research provided. Do not add
information not present in the notes. If the notes do not address the
question, say so clearly.

Notes:
{context_block}

Query: {analyst_query}

Response:"""

    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )
    model_response = response.choices[0].message.content

    # 6. Validate before returning
    validation = validate_financial_response(
        analyst_query, context_block, model_response
    )

    if not validation["approved"]:
        return {
            "response": "Response flagged for review. Please contact your compliance team.",
            "approved": False,
            "reason": validation
        }

    return {
        "response": validation["safe_response"],
        "approved": True
    }

Regulatory mapping

Compliance coverage for financial AI

Financial AI systems touch several overlapping regulatory regimes. This table maps the controls in this pipeline to the specific requirements.

Regulation	Requirement	VectaX covers	Gaps to address
GLBA / FTC Safeguards Rule	Encrypt customer financial data; documented key management	Yes FPE + vector encryption	Write and document key rotation policy
SEC Regulation S-P	Protect customer records and information	Yes RBAC + encrypted storage	Incident response plan for AI breaches
MiFID II	Record-keeping of trading communications and orders	Partial encrypted storage; audit logging is separate	Immutable audit log for all AI-assisted queries
DORA (EU)	ICT risk management including AI; resilience testing	Partial encryption controls	Red team testing covered in Module G2
GDPR	Data minimisation; encryption of EU personal data; DPIAs	Yes FPE for PII; RBAC for access minimisation	DPIA documentation; consent management for AI decisions
Securities Act (insider trading)	Information barriers between departments with MNPI	Yes cryptographic Chinese walls via RBAC scoping	Written Chinese wall policy; employee training records

What this pipeline does not cover

This module covers the data protection side of financial AI compliance. Red teaming the AI system for adversarial attacks, jailbreaks, and prompt injection is in Module G2. A compliant financial AI system needs both: the controls to protect data, and the evidence from red teaming that the controls hold under adversarial conditions.

Common questions

FAQ

Why does financial AI need encrypted retrieval and not just encrypted storage?

Financial AI systems retrieve sensitive records at query time. Encrypting data at rest means it is decrypted in memory every time a query runs, creating a persistent exposure window in the AI service layer. Any compromise of the retrieval service exposes customer account data and trade history. Encrypted retrieval using VectaX keeps the data encrypted through the similarity search, so a breach of the retrieval service yields only ciphertext.

How does VectaX RBAC enforce Chinese wall requirements in financial services?

VectaX RBAC generates per-user secret keys scoped to roles, groups, and departments. A key generated for role=equity_analyst in group=equities cannot decrypt records tagged for group=fixed_income or group=m_and_a. The separation is cryptographic, not just a query filter. An equity analyst who gains access to the vector database still cannot read fixed income deal records without the correct key.

Can encrypted embeddings be used for fraud pattern matching?

Yes. VectaX vector encryption is similarity-preserving. The geometric relationships between encrypted transaction embeddings are identical to those of plaintext embeddings. Fraud pattern matching based on cosine similarity or nearest-neighbor search produces the same results on encrypted vectors as on plaintext vectors, with no loss of detection accuracy.

What financial regulations apply to AI systems handling customer data?

In the United States, GLBA requires financial institutions to protect customer financial information and implement safeguards. The FTC Safeguards Rule specifies technical controls including encryption. In the EU, MiFID II requires record-keeping and audit trails for trading activity. DORA adds ICT risk management requirements including for AI systems. GDPR applies to any EU customer data.

How should AI-generated financial advice be validated before reaching customers?

AI-generated financial advice should be checked for hallucinations before delivery. AgentIQ hallucination detection compares the model response against the retrieved context and flags responses that include claims not grounded in the source data. For regulated advice, the output should also be checked for PII leakage and for bias that could indicate discriminatory outcomes.

What is the difference between a Chinese wall and RBAC in financial AI?

A Chinese wall is a business policy that separates information flows between departments to prevent conflicts of interest. RBAC is the technical mechanism that enforces it. In a financial AI context, a Chinese wall between equities and M&A means analysts in one group must not be able to retrieve material non-public information from the other. VectaX RBAC enforces this cryptographically: the key scoping ensures that even a misconfigured query cannot cross the wall.

Next: Red team this system end-to-end with DiscoveR

Module G2 runs DiscoveR against the financial AI system you built here. You will see what a jailbreak attempt looks like against a trading desk assistant, which attack categories apply to financial AI, and how to interpret and act on the scan results.

Continue to G2 → VectaX documentation →