What is RAG poisoning and how does it work as an attack?

RAG poisoning is an attack where an adversary inserts documents into the retrieval database that contain hidden instructions. When a user query triggers retrieval of the poisoned document, the model reads those instructions as part of its context and may execute them, overriding its intended behaviour. The attack is indirect: the user submits a legitimate query, but the poisoned document redirects the model's actions. RAG poisoning is particularly dangerous because it bypasses input-layer injection detection: the user's query is clean, and the injection arrives through the trusted retrieval path. Detection requires monitoring the retrieved content, not just the user input.

AI as an Attack Tool: Adversarial AI and AI-Powered Threats | Track 3E

Q: How has AI changed the economics of cyberattacks?

AI has shifted cyberattacks from skill-constrained to capital-constrained. Previously, a sophisticated spear-phishing campaign required experienced human researchers who could investigate a target, craft personalised messages, and identify the right moment to send. AI removes the human research bottleneck: a language model can generate thousands of personalised phishing emails in seconds by combining publicly available information about targets with effective social engineering patterns. The same shift applies to malware generation, vulnerability research, and exploit development. What took a team of skilled attackers weeks now takes a well-resourced attacker with AI access hours or days.

Q: What are the most dangerous AI-powered attack techniques today?

Five categories are most operationally significant now. First, AI-scaled spear phishing: LLMs generate highly personalised phishing messages from public OSINT data at a scale that defeats traditional detection (generic phishing filters trained on mass-market templates). Second, AI malware generation: LLMs assist in writing polymorphic code that changes its signature on each execution, defeating signature-based detection. Third, automated vulnerability research: AI analyses code repositories to identify patterns similar to known vulnerability classes, accelerating the time from code discovery to working exploit. Fourth, real-time deepfake impersonation: voice cloning in live audio calls enables business email compromise at the phone-call level. Fifth, LLM jailbreak automation: automated systems that fuzz your AI deployment with thousands of prompts to find bypass techniques faster than a human red team.

Section 01

The economics shift

Before AI, most sophisticated attacks were skill-constrained. A credible spear-phishing campaign required a researcher who could investigate a target, write convincingly in their voice, identify the right pretext, and time the delivery. Ransomware required developers. Vulnerability research required people who understood both code and exploitation technique.

AI removed the skill bottleneck. What changed is not what attacks are possible but how many can be run simultaneously and how little expertise is required to run them. The ceiling on attacker scale was human attention. AI removes that ceiling.

The practical consequence: the volume of sophisticated-looking attacks has increased dramatically. Defences designed around the assumption that sophisticated attacks are rare and resource-intensive are now calibrated for the wrong threat model.

Before AI: skill-constrained

Spear phishing required human research per target

Malware variants needed experienced developers

Vulnerability research required security expertise

Social engineering required skilled human actors

Scale was limited by the number of skilled people

Campaign preparation took days to weeks

Defences: train on generic patterns, rely on rarity

After AI: capital-constrained

LLMs generate personalised spear phishing at scale from OSINT

LLMs generate polymorphic malware variants on demand

AI scans code repositories for vulnerability patterns automatically

Voice cloning enables real-time deepfake impersonation

Scale is limited only by compute budget and API access

Campaign preparation takes minutes to hours

Defences: pattern-based detection fails on novel generated content

The Mirror Security 2025 Year in Review captured this shift precisely: "The model crown rotates so fast it barely matters. Meanwhile, the real problem stayed the same: how do you use AI on sensitive data without leaking it to the provider, the operator, or the platform?" The same LLMs that power productivity tools power the attacks against them. Defences must match the capability of the tools being used against them.

📋 Mirror Blog · Mirror Security: 2025 Year in Review

Section 02

Six AI attack categories

AI-powered attacks fall into six categories that are operationally significant today. Each has different targets, different detection challenges, and different defences. The first three (phishing, malware, vulnerability research) are AI augmenting traditional attack types. The last three (deepfakes, model extraction, RAG poisoning) are new attack classes that only exist because AI exists.

✉

AI phishing

LLMs generate personalised spear-phishing at scale, using OSINT to craft messages that reference real events, colleagues, and projects. Defeats template-based detection.

Massive scale

🐛

AI malware

LLMs generate polymorphic malware variants that change their signature on each execution. Defeats signature-based AV. Experienced developers no longer required.

High velocity

🔍

AI vuln research

AI analyses code repositories and identifies patterns similar to known CVE classes. Accelerates the time from code discovery to working proof-of-concept exploit.

High precision

🎥

Deepfakes

Voice cloning enables real-time audio deepfakes during phone calls. Video deepfakes enable CEO fraud. Attacks that once required weeks of footage now work from minutes of sample audio.

High impact

📈

Model extraction

Systematic API querying to steal model capabilities. Attackers harvest (prompt, response) pairs to train a student model. Documented at scale: 16M+ exchanges from 24K accounts (Anthropic, 2026).

Industrial scale

🎲

RAG poisoning

Inserting poisoned documents into the retrieval index. When a user query retrieves the document, hidden instructions redirect the model. Bypasses input-layer injection detection entirely.

Hard to detect

Section 03

AI-scaled phishing

Traditional phishing detection was calibrated to catch generic messages sent to large recipient lists. The same template with minor variations, sent to thousands of people. Detection filters were good at this because the attack economics required scale over personalisation: mass campaigns were cheaper than bespoke ones.

AI inverted this tradeoff. An LLM can generate a highly personalised spear-phishing email from publicly available information in seconds. For each target, it can research their employer, recent news about their company, their LinkedIn connections, and their public posts. It can then write a message that sounds like it came from a known colleague, references a real recent event, and uses the target's communication style.

AI spear phishing workflow (previously required a human researcher per target)

1

OSINT collection (automated)

Scrape LinkedIn profile, company website, press releases, social media posts. Extract: role, team, recent projects, reporting structure, communication style.

AI-automated

2

Pretext construction (LLM)

Generate a plausible pretext: a colleague's name, a real recent event at the company, a believable reason for urgency. Cross-reference the target's reporting structure to choose the right sender persona.

AI-automated

3

Message generation (LLM)

Generate the phishing message body using the collected context. Match the communication style found in the target's public posts. Include specific details that make the message feel genuine.

AI-automated

4

Delivery timing optimisation

Schedule delivery based on the target's time zone and likely working hours. Avoid weekends and holidays. Send during the first hour of the working day when cognitive load is lower.

Previously manual

5

Scale: repeat for 10,000 targets simultaneously

The entire workflow above runs in parallel for thousands of targets. The economics that once made personalised spear phishing rare now make it the default attack strategy.

AI-automated

What traditional detection catches

Generic templates sent to large recipient lists

Repeated identical or near-identical messages

Malicious links already in threat intel feeds

Known sender domains on blocklists

Obvious grammar errors and translation artifacts

What AI phishing bypasses

Unique message body per target: no template match

Novel links not yet in threat intel feeds

Newly registered sending domains, no blocklist entry

Native-quality writing in the target's language

Personalised details that defeat "this seems off" detection

The defence shift for AI phishing is organisational, not technical. Technical detection of AI-generated phishing is hard because the messages are well-written, novel, and personalised. The effective defences are: hardware security keys for authentication (so credential theft does not enable access), network-level isolation of high-value email accounts, and training on specific AI phishing patterns rather than generic phishing indicators. DiscoveR cannot detect inbound AI phishing, but it reveals whether your AI applications are being used as a platform to generate such attacks against others.

Section 04

AI malware generation

Malware development has traditionally required programming skill and knowledge of the target environment. LLMs have reduced the expertise barrier significantly. An attacker with access to a capable coding model can request code that performs specific malicious actions, iterate on it to evade detection, and generate many variants to maximise the chance that at least one passes through security controls.

Polymorphic generation is the most practically significant capability. Signature-based antivirus and EDR detection relies on known patterns in malware code. If the attacker can generate functionally identical malware with different code structure, variable names, and calling conventions each time, each new sample appears novel to signature-based detection. LLMs are good at this kind of structural variation while preserving function.

Evasion assistance is also significant. LLMs can help an attacker understand why a specific malware sample was flagged, suggest modifications to avoid the detection pattern, and generate testing harnesses to verify the modification worked. This accelerates the iteration cycle from hours to minutes.

The capability ceiling is still meaningful: frontier LLMs include safety measures that limit direct malware-as-a-service usage. But jailbreaks against these safety measures are documented and actively maintained by attacker communities. The DiscoveR jailbreak category tests your own AI deployment against the same techniques used to jailbreak safety measures on general-purpose models.

AI-assisted malware development is not hypothetical. Security researchers have documented LLM-assisted malware samples in the wild since 2023. The FBI and CISA have issued advisories specifically addressing LLM-assisted attack campaigns. The specific concern is not that LLMs write entirely novel malware from scratch, but that they dramatically lower the skill required to adapt existing malware families to new targets and environments.

Section 05

AI-assisted vulnerability research

Vulnerability research is one of the most direct applications of AI to offensive capability. AI can analyse large codebases much faster than humans and can be trained or prompted to look for patterns similar to known vulnerability classes. This does not replace the insight of an experienced security researcher, but it dramatically compresses the time from code exposure to identified vulnerability candidate.

The practical workflow: an AI-assisted vulnerability scanner ingests the target codebase, applies pattern matching based on known vulnerability classes (buffer overflows, injection points, authentication bypasses, race conditions), ranks candidates by confidence and exploitability, and produces a shortlist for human review. A human researcher then validates the candidates and develops working exploits. The AI handles the search; the human handles the exploitation logic.

For AI-specific systems, this creates a specific threat: automated scanning of your AI deployment's attack surface. DiscoveR does exactly this from the defensive side, 60 plus attack modes scanning for AI-specific vulnerabilities. Attackers are building equivalent offensive tools. The question for defenders is whether their defensive scanning runs before or after the attacker's offensive scan finds the same vulnerability.

DiscoveR is the defensive equivalent of AI-assisted vulnerability research. It fingerprints your AI system, selects the attack strategies most likely to succeed based on your system type, and runs 2,500 plus structured probes across 11 categories. Running DiscoveR regularly means you find your vulnerabilities before an attacker's equivalent tool does.

Section 06

Deepfakes and impersonation

Deepfake technology has moved from a research curiosity to a practical attack tool. The most operationally significant capability today is voice cloning for real-time audio impersonation during phone calls. Unlike video deepfakes that require careful generation and review, real-time voice cloning can be run live, allowing an attacker to impersonate a CEO, CFO, or IT administrator during an actual phone call.

The business email compromise (BEC) attack pattern has evolved to include voice. The classic BEC uses email: an attacker spoofs the CFO's email to instruct a finance employee to make a wire transfer. The AI-augmented version adds a follow-up phone call from a cloned voice of the CFO to confirm the instruction verbally. The finance employee receives both a written and verbal instruction from what appears to be the CFO. The combination is much more convincing than either alone.

Synthetic persona creation is another operationally significant capability. AI can generate realistic profiles including photos, posting histories, and consistent voice for use in targeted social engineering. An attacker building trust with a target over several weeks no longer needs a human agent maintaining the relationship. The synthetic persona handles routine communications; a human steps in only for the critical moment when the actual attack occurs.

The defences are primarily process-based rather than technical: out-of-band verification for high-value transactions, code words established in advance for critical authorisations, and video call requirements for sensitive decisions (video deepfakes are still harder than audio for real-time use). AgentIQ does not directly defend against real-world voice deepfakes, but it does flag deepfake-like injection attempts when voice-transcribed content enters an AI system as input.

Section 07

Model extraction

Model extraction attacks (also called distillation attacks when targeting frontier model capabilities) are covered in depth in E2 (monitoring) from the detection angle. In this section, we look at them from the attacker's perspective to understand why they work and what makes them hard to stop.

The goal is to build a student model that approximates the capabilities of the target model by training on (prompt, response) pairs harvested from the target's API. At sufficient scale, this produces a model that approaches the frontier model's performance on target domains, built at a fraction of the research and compute cost.

The attack is documented at industrial scale. Anthropic's February 2026 disclosure described over 16 million exchanges harvested from roughly 24,000 fake accounts, with one proxy network managing more than 20,000 simultaneous fraudulent accounts mixing extraction traffic with legitimate requests to avoid detection.

What makes it hard to stop: a single distillation query is indistinguishable from a legitimate research query. Detection requires population-level statistics across accounts and over time (covered in E2). The VectaX FHE stack provides a technical countermeasure that operates at a different layer: instead of trying to detect the attacker, it makes the harvested output toxic for training. The noise injected at the latent level accumulates across a harvested corpus, degrading any student model trained on it.

16M+

exchanges harvested from Anthropic, February 2026

24K

fake accounts used in one documented campaign

20K+

simultaneous fraudulent accounts in one proxy network

3

named AI laboratories in Anthropic's disclosure

📋 Mirror Blog · The Distillation Problem: Make the Harvest Worthless

Section 08

RAG poisoning and indirect injection

RAG (Retrieval-Augmented Generation) systems retrieve relevant documents from a vector database to add context to model queries. This creates an attack surface that does not exist in non-RAG deployments: the content of the retrieved documents is part of the model's context window and can contain instructions that redirect the model's behaviour.

RAG poisoning involves inserting documents into the retrieval database that contain hidden instructions alongside legitimate-looking content. When a user query causes the poisoned document to be retrieved, the hidden instructions become part of the model's context. The model may then follow those instructions instead of or alongside the user's actual request.

Indirect prompt injection is the broader class that includes any attack where the injection arrives through content the model processes rather than through the user's direct input. Email content processed by an AI assistant, web pages read by an AI browser agent, PDF documents processed by an AI document analyser: all are injection surfaces if the model processes their content as context.

RAG poisoning attack flow

💬

User query

Clean input

🔍

Vector search

Normal

☣

Retrieved doc (poisoned)

Contains instructions

🤖

Model sees context

Injection in window

⚠

Redirected output

Attacker-controlled

The user's query is clean. The injection does not appear in the user input. Input-layer prompt injection detection does not see it. Detection requires monitoring the retrieved content and the model's chain-of-thought for signs of redirection.

The VectaX retrieval audit log records which documents were retrieved for each query. If an incident investigation reveals that poisoning occurred, the audit log identifies which document was the source. AgentIQ's chain security validation detects when the model's reasoning chain has shifted away from the intended task scope, which is a runtime signal that indirect injection may have occurred even before the poisoned document is identified.

Section 09

Jailbreak automation

Jailbreaking an AI system means finding an input that causes the model to bypass its safety guardrails and produce output it is designed to refuse. Jailbreaks were initially manual efforts requiring creative prompt engineering. They are now systematically automated.

Automated jailbreak campaigns work by fuzzing the target AI system with thousands of prompt variations, using feedback from partial successes to guide the next generation of prompts, and sharing successful techniques across attacker communities. This is the same evolutionary search process used in software fuzzing, applied to language models.

The arms race is real: a jailbreak technique that is published on Monday is often patched by the model provider by Friday. But new techniques appear constantly. A model that was secure against all known jailbreaks last week may be vulnerable to a new technique discovered this week. This is why continuous adversarial testing, not a one-time evaluation, is required to maintain confidence in model safety.

NIST's evaluation of a named frontier model found it responded to 94% of malicious requests under common jailbreaking techniques, compared to 8% for US frontier reference models. The difference was not model capability. The difference was that safety alignment can degrade through fine-tuning and through distillation if safety properties are not explicitly preserved.

Section 10

DiscoveR: test your exposure

DiscoveR is Mirror Security's adversarial testing platform. It tests your actual AI deployment, not a theoretical model. That means it tests your tools, your RAG pipeline, your agent workflows, and your policies together. Attackers target your deployed system, not the model in isolation. DiscoveR's testing scope matches the attacker's target scope.

From the Mirror Security 2025 Year in Review: "DiscoveR tests your actual deployment, tools, RAG, policies, agents, apps, and all, because that's what attackers target. It fingerprints the system first, then runs adversarial campaigns across jailbreaks, prompt injection, RAG poisoning, tool abuse, model extraction, and membership inference."

60+

attack modes across all categories

2,500+

adversarial prompts in the test library

11

attack categories covered

100%

actual deployment tested, not the model in isolation

01

Jailbreaks

Attempts to bypass safety guardrails using role-play, instruction override, and other known techniques

02

Prompt injection

Direct injection attempts through user input, targeting system prompt override and scope violation

03

RAG poisoning

Tests whether retrieved content can redirect model behaviour through indirect injection patterns

04

Tool abuse

Attempts to invoke tools outside their intended scope or to chain tools in unintended ways

05

Model extraction

Structured extraction attempts targeting reasoning capabilities and training data patterns

06

Membership inference

Tests whether queries can reveal whether specific data was in the model's training set

07

Data exfiltration

Attempts to extract content from the model's context window or retrieval index

08

Bias and toxicity

Tests for outputs that violate fairness requirements or produce harmful content under adversarial prompting

09

Hallucination under attack

Tests whether adversarial inputs systematically increase factual error rates in model outputs

10

Policy evasion

Attempts to bypass deployed policy controls without triggering the guardrail classification

11

Agent hijacking

Tests whether agent workflows can be redirected to take out-of-scope actions through injection or delegation manipulation

📋 Mirror Blog · Mirror Security: 2025 Year in Review

Section 11

AgentIQ: runtime defence

DiscoveR identifies your vulnerabilities. AgentIQ defends against them at runtime. The two products address different moments in the attack lifecycle: DiscoveR operates before and between attacks to find the surface; AgentIQ operates during every request to enforce against the current attack.

From the Mirror Security 2025 Year in Review: "AgentIQ is policy enforcement for agents: signed actions, attestable execution, and 100 plus controls that gate tools and data. At the center is a deny-by-default policy engine with 100 plus deployable policies and domain packs for finance, healthcare, enterprise IT, and privacy compliance. Decisions come back as allow/deny/monitor, with risk scoring and human-readable rationale. Inline defences run fast enough for the hot path at approximately 50ms."

Against each AI-powered attack type, AgentIQ applies a specific defence:

For jailbreaks: AgentIQ classifies each output for safety violations before it reaches the user or triggers agent actions. If the output contains content that violates the deployed policy, it is blocked with rationale before the user sees it.

For prompt injection: AgentIQ monitors chain-of-thought traces. If the reasoning chain shifts to justify actions outside the authorised scope, that chain security violation is flagged. This catches injections that succeed in redirecting the model's intent before the resulting action reaches the downstream system.

For tool abuse: AgentIQ gates every tool call against the policy engine before it executes. A policy that says a customer support agent may only call the refunds API is enforced at every tool invocation, even if the model's reasoning has been compromised by an injection.

For RAG poisoning: AgentIQ's chain security validation checks whether the model's reasoning is consistent with the authorised task. A poisoned document that redirects the reasoning will produce a chain security violation signal before the redirected action reaches downstream systems.

Section 12

Defence mapping

Each AI-powered attack category requires a different defensive response. The mapping below shows which Mirror Security product is the primary defence for each attack type and what it does specifically.

AI-scaled phishing (inbound)

Outbound: DiscoveR tests whether your AI application can be weaponised to generate phishing content for others. AgentIQ policies block phishing-pattern output generation. VectaX encrypted inference prevents attackers from learning what context your model was given to generate such content.

DiscoveRAgentIQVectaX

Jailbreak attacks

DiscoveR identifies which jailbreak techniques succeed against your deployment. AgentIQ's output classification catches jailbreak-driven policy violations at the output layer before they reach users. Running both ensures: DiscoveR finds the vulnerability, AgentIQ enforces against it during active attacks.

DiscoveRAgentIQ

Prompt injection (direct)

AgentIQ detects injection-driven scope violations in chain-of-thought traces and flags injection signals in each output. AgentID enforces that any out-of-scope action the injection attempts to trigger is blocked at the gateway before it reaches downstream systems. AgentIQ detects the attempt; AgentID stops the consequence.

AgentIQAgentID

RAG poisoning (indirect injection)

VectaX encrypted retrieval audit log records which documents were retrieved. AgentIQ chain security validation detects when retrieved content has redirected the reasoning chain. DiscoveR tests RAG poisoning resistance in your deployment. All three together: VectaX audits what was retrieved, AgentIQ detects the redirect, DiscoveR validates your overall RAG poisoning defence.

VectaXAgentIQDiscoveR

Model extraction and distillation

VectaX FHE stack injects training-hostile noise at the latent level making harvested outputs degrade any student model trained on them. E2 monitoring layer detects extraction campaigns through population-level query analysis. Together: monitoring catches the campaign, VectaX makes the harvest worthless regardless.

VectaX FHE stackE2 monitoring

Tool abuse by agents

AgentIQ gates all tool calls against the policy engine before execution. AgentID capability-scoped tokens bound what tools an agent can call. A tool that is not in the token scope cannot be called even if the agent's reasoning has been compromised. DiscoveR tests tool abuse scenarios in your specific deployment.

AgentIQAgentIDDiscoveR

Data exfiltration via AI queries

VectaX FHE-encrypted inference ensures that even if the model is compromised or jailbroken to output context window content, the inference infrastructure never had plaintext to expose. AgentIQ detects PII in outputs before they reach the user. AgentID audit log records every data access for forensics if exfiltration occurs.

VectaXAgentIQAgentID

Section 13

Track 3E complete

The five modules of Track 3E form a complete security operations posture for AI deployments. Each module addresses a distinct layer of the problem. Together they cover the full cycle from architecture through monitoring through incident response through compliance through the adversarial threat landscape.

E1

Zero Trust Architecture for AI

The inference gap, verifiable inference, agent identity, capability tokens, and the four AI trust planes. The security model that everything else builds on.

Architecture

E2

Security Monitoring and Anomaly Detection

Five monitoring layers, population-level distillation detection, privacy-preserving logging, MITRE ATLAS mapping, AgentIQ and DiscoveR as monitoring tools.

Monitoring

E3

AI Incident Response

Four incident type playbooks, forensics artifacts, containment with token revocation, the DiscoveR remediation cycle, communication template, and post-incident review.

Response

E4

Compliance in Practice

NIST AI RMF, ISO 42001, EU AI Act, GDPR Articles 22/25/35, and a master compliance map showing which Mirror product generates which evidence for which framework.

Compliance

E5

AI as an Attack Tool

How AI changed attack economics, six AI attack categories, DiscoveR's 60+ attack modes, AgentIQ runtime defence, and defence mapping across all attack types.

Threat landscape

🎉

Track 3E: Security Operations for AI · Complete

You have finished Track 3E

From zero trust architecture through adversarial AI. You now have a working understanding of how to secure AI deployments operationally: the architecture to build on, the monitoring to detect threats, the playbooks to respond, the compliance evidence to produce, and the adversarial threat landscape to defend against. Mirror Security's platform covers every layer.

Section 14

Frequently asked questions

How has AI changed the economics of cyberattacks?

AI shifted cyberattacks from skill-constrained to capital-constrained. Previously, sophisticated attacks required experienced human operators for each target. AI removes this bottleneck: LLMs can generate personalised spear-phishing from OSINT in seconds, generate polymorphic malware variants on demand, and scan codebases for vulnerability patterns automatically. The ceiling on attack scale was human attention. AI removes that ceiling. The volume of sophisticated-looking attacks has increased dramatically, and defences designed around the assumption that sophisticated attacks are rare are now calibrated for the wrong threat model.

What are the most dangerous AI-powered attack techniques today?

Five categories are most operationally significant now. AI-scaled spear phishing: personalised messages at scale defeating generic detection. AI malware generation: polymorphic variants defeating signature-based detection. Automated vulnerability research: AI scans code for CVE-class patterns, accelerating time to exploit. Real-time deepfake impersonation: voice cloning enables live phone call impersonation for CEO fraud and BEC. Automated jailbreak fuzzing: systematic prompt fuzzing finds bypass techniques faster than human red teams. All five are in active use by threat actors today.

What is RAG poisoning and how does it work?

RAG poisoning inserts documents into a retrieval database that contain hidden instructions alongside legitimate content. When a user query retrieves the poisoned document, the hidden instructions become part of the model's context window. The model may then follow those instructions, overriding its intended behaviour. The attack bypasses input-layer injection detection because the user's query is clean. Detection requires monitoring retrieved content and checking whether the model's chain-of-thought has been redirected. AgentIQ's chain security validation and the VectaX retrieval audit log together provide the detection and forensics capability for this attack.

How does DiscoveR test for AI-powered attack techniques?

DiscoveR runs structured adversarial campaigns against your actual deployed AI system using 60 plus attack modes and 2,500 plus prompts across 11 categories. The categories include jailbreaks, prompt injection, RAG poisoning, tool abuse, model extraction, membership inference, data exfiltration, bias and toxicity, hallucination under attack, policy evasion, and agent hijacking. DiscoveR fingerprints your system first, then selects strategies most likely to succeed based on your system type. A hierarchical judging system reduces false positives. Results show exactly what broke, how it broke, and what layer needs to change: policy, retrieval, tool permissions, or the model itself.

How does AgentIQ defend against AI-powered attacks at runtime?

AgentIQ enforces a deny-by-default policy engine with 100 plus deployable policies at approximately 50 milliseconds inline. For jailbreaks: classifies each output for safety violations before reaching users. For prompt injection: monitors chain-of-thought to detect when reasoning has been redirected away from the authorised task. For tool abuse: gates every tool call against the policy engine before execution. For RAG poisoning: chain security validation detects when retrieved content redirected the reasoning. Decisions are returned as allow, deny, or monitor with human-readable rationale and risk scores. Domain packs for finance, healthcare, enterprise IT, and privacy compliance ship ready to deploy.

AI as an Attack Tool

The economics shift

Six AI attack categories

AI-scaled phishing

AI malware generation

AI-assisted vulnerability research

Deepfakes and impersonation

Model extraction

RAG poisoning and indirect injection

Jailbreak automation

Run the same automated jailbreak campaigns against your system that attackers run

DiscoveR: test your exposure

AgentIQ: runtime defence

100 plus policies. Deny-by-default. 50ms inline enforcement.

Defence mapping

Track 3E complete

Frequently asked questions

Not trust. Not promises. Guarantees through math, attestation, and enforcement.