Module 04: AI Threat Modeling - STRIDE for AI, MITRE ATLAS Applied, Practical WorkshopTrack 1 Module 04. AI threat modeling applies STRIDE and MITRE ATLAS to AI systems to identify what can go wrong before attackers find it. Why AI threat modeling differs from traditional: model is an attack artifact (weights can be poisoned or extracted), natural language inputs cannot be fully sanitised, training pipeline is a new attack surface with no traditional equivalent. Five steps: draw the system with correct components and data flows, identify trust boundaries (user input untrusted, retrieved documents partially untrusted, tool call outputs untrusted, model output driving agent actions must be validated before execution), apply STRIDE with AI-specific threats, map MITRE ATLAS tactics to components, score and prioritise threats by likelihood times impact. STRIDE for AI: Spoofing covers prompt injection causing model to act as different identity. Tampering covers data poisoning of training data or RAG index. Repudiation covers lack of signed audit trails for agent actions. Information Disclosure covers sensitive data leaking through model outputs from training data or context. Denial of Service covers unbounded resource consumption through adversarial prompts. Elevation of Privilege covers prompt injection causing agent to take actions beyond authorised scope. Trust boundaries in AI systems: between user and application, between application and retrieval index, between model and tool layer, between agent and external APIs. Worked example: customer service LLM with RAG and tool access. Components: user browser, application API, prompt construction, vector database retrieval, LLM inference, tool layer (CRM read, refunds API). Trust boundaries: user to API is untrusted, indexed documents are partially trusted, model output to tool call must be validated, refunds API access is high privilege. Threat register: prompt injection causing scope violation (critical), RAG document poisoning redirecting model (critical), refunds API abuse through injection (critical), system prompt extraction (high), PII leakage from training data (high), embedding inversion from vector database (medium). Likelihood times impact scoring. MITRE ATLAS tactic mapping: AML.TA0002 Initial Access via API access, AML.TA0003 ML Model Access via inference endpoint, AML.TA0004 Execution via crafted prompt, AML.TA0010 Exfiltration via inference API, AML.TA0013 Model Evasion via jailbreak. DiscoveR validates threat model by running structured adversarial probes across same categories identified. AgentIQ policy selection driven by highest-priority threats from the model: deny-by-default tool access for refunds, chain-of-thought monitoring for injection detection.PT42MBeginnertrueen2026-04-08Mirror Academy
Module 04 of 5 · Track 1: AI Security Fundamentals
Find what breaks before attackers do. On paper first.
AI Threat Modeling
Threat modeling is how you work out what an attacker would target before you deploy. This module teaches you to apply STRIDE to LLM and agent systems, use MITRE ATLAS to structure your adversary tactics, and walk through a full worked example from data flow diagram to a prioritised threat register. The output tells you which controls to build first.
Traditional threat modeling, whether you use STRIDE, PASTA, or LINDDUN, was designed for software systems with deterministic inputs and outputs. You map data flows, identify trust boundaries, enumerate threats at each boundary, and rank them. The methodology works. But AI systems have three characteristics that traditional threat modeling does not address well.
The model itself is an attack artifact. In traditional software, the code is the artifact and it can be patched when a vulnerability is found. In AI systems, the model weights are also an artifact. Weights can be poisoned during training, backdoored invisibly, or extracted through systematic API queries. There is no "patch" equivalent for a compromised model: you retrain, which costs time and money.
Natural language inputs cannot be fully sanitised. SQL injection is fixed by parameterising queries. The fix is complete and the threat class is closed. Prompt injection has no equivalent fix because the model must read natural language to function and cannot reliably separate instructions from data in that natural language. Every improvement is a partial control, not a closure.
The training pipeline is a new attack surface. Traditional software has a build pipeline. If an attacker can compromise the build, that is serious but well-understood. AI systems have a training pipeline where the data that flows through it shapes the model's behaviour forever. Poisoning training data changes the model's behaviour in ways that can be invisible at evaluation time and only trigger on specific inputs.
These three differences do not break the STRIDE methodology. They require adding AI-specific threat examples to each STRIDE category and adding new components and trust boundaries that traditional data flow diagrams do not include.
You already know the vocabulary. Modules 01 to 03 gave you the AI attack surface, the six attack categories, MITRE ATLAS, and OWASP Top 10 for LLMs. This module shows you how to organise all of that into a structured threat model for a specific system you are building or defending.
Section 02
The five-step process
AI threat modeling follows the same five steps as traditional threat modeling, but each step has AI-specific content. The process is designed to produce one output: a prioritised list of threats with associated controls, so you know what to build first.
1
Draw the system
Map every component that processes, stores, or transmits data. For AI systems this includes: the model inference endpoint, the vector database and embedding pipeline (if RAG), the prompt construction layer, the tool layer (every API the model can call), and the memory or context store. Do not omit components because they seem internal or safe.
Output: data flow diagram with all AI components
2
Identify trust boundaries
Draw a line wherever the trust level changes. User input is always untrusted. Retrieved documents from a RAG index are partially trusted if external content is indexed. Model output that drives tool calls must be treated as potentially hostile. External API responses are untrusted. Each trust boundary is a candidate attack surface.
Output: annotated diagram with trust boundaries marked
3
Apply STRIDE at each trust boundary
For each trust boundary, ask all six STRIDE questions with AI-specific examples. Most threats in AI systems appear at the user-to-model boundary (prompt injection), the retrieval-to-context boundary (RAG poisoning), and the model-to-tool boundary (tool abuse via injection).
Output: raw threat list per boundary
4
Map threats to MITRE ATLAS tactics
For each threat, identify the MITRE ATLAS tactic it maps to. This connects your threat list to the broader security community vocabulary, makes it easier to find relevant detection logic, and lets you identify which MITRE ATT&CK techniques in your existing SIEM might overlap.
Output: threat list with ATLAS tactic codes
5
Score and prioritise
Score each threat on likelihood and impact. Multiply to get a priority score. Likelihood factors: does your system have the exposed surface? Is this attack technique actively used in the wild? Impact factors: what is the worst case if this succeeds? Customer data? Regulatory breach? Revenue loss? Prioritise the top five threats for immediate control selection.
Output: prioritised threat register with control recommendations
Section 03
Step 1: draw the system
A data flow diagram for an AI system has more components than a traditional web application. The diagram below shows the components for a typical LLM application with RAG and tool access. Every component is a potential attack surface. Every arrow is a data flow that crosses a trust boundary or stays within one.
The components most commonly left off AI data flow diagrams are the prompt construction layer (where the system prompt and retrieved context are assembled before the model sees them) and the model-to-tool gateway (where model output is translated into actual API calls). Both are critical for threat modeling because both are points where an injection attack can pivot from causing the model to say wrong things to causing the model to do wrong things.
AI system data flow diagram: LLM with RAG and tool access
Untrusted zone
User browser / client
Natural language input. No validation possible at this layer. Everything here is untrusted by definition.
▼ Trust boundary: user to application ▼
Application zone
API gateway + auth
Authentication, rate limiting, input logging
Application zone
Prompt construction
System prompt + retrieved context + user query assembled here. Key injection surface.
Application zone
Vector DB retrieval
Embedding search. Retrieved docs may contain attacker-controlled content.
▼ Trust boundary: application to model ▼
Model zone (inference)
LLM inference endpoint
The model. Output is plausible text but not guaranteed safe. Output that drives tool calls must be validated before execution.
▼ Trust boundary: model output to tool execution ▼
Tool zone
Tool gateway (AgentIQ)
Validates tool calls against policy before execution. Deny-by-default.
External (untrusted)
CRM read API
Read-only customer data. External API responses are untrusted.
External (untrusted)
Refunds API
High-privilege write action. Requires explicit authorisation per call.
The training pipeline is not shown above because it operates before deployment. For a complete threat model, draw a second diagram for the training pipeline: where training data comes from, who can write to it, how the model is trained and evaluated, and how the trained weights are transferred to production. Data poisoning threats live in that diagram.
Section 04
Step 2: trust boundaries
A trust boundary is a line where data moves from one zone of trust to another. Threats arise at trust boundaries because that is where data passes between different owners, different trust levels, and different validation rules. If you miss a trust boundary, you miss the threats that cross it.
AI systems have trust boundaries that traditional web applications do not have. The four most important ones:
User to application. User input is always untrusted. This is the same as in traditional web applications. The difference is what "malicious input" looks like: in traditional apps, you scan for SQL fragments, HTML tags, and shell metacharacters. In AI apps, malicious input is natural language that redirects the model's behaviour. There is no equivalent of a regex that catches all prompt injection.
Retrieval to context window. Documents retrieved from a vector database and placed into the model's context are only as trusted as the source that added them to the index. If the index includes content from user uploads, web crawls, or third-party sources, that content is partially untrusted. An attacker who can add a document to the index can inject instructions into the model's context without the user sending anything malicious.
Model output to tool execution. The model's text output is plausible but not guaranteed to be safe to execute. When model output is parsed as a tool call and executed against real APIs, that execution crosses a critical trust boundary. The model's reasoning may have been compromised by an injection in the context. Validating tool calls against an explicit policy before execution is the primary defence here.
Agent to external APIs. All external API responses must be treated as untrusted. An API that an agent calls might be compromised or might return content that contains further injection instructions. An agent that processes API responses without validation is exposed to indirect injection from the API layer.
Section 05
Step 3: STRIDE for AI
STRIDE is a threat taxonomy, not a methodology. It gives you six categories to ask about for each data flow. Applied to AI systems, each category has specific AI threat examples that traditional STRIDE lists do not include.
S
Spoofing
Traditional: attacker claims to be a different user or system
In AI: prompt injection causes the model to act as a different persona, role, or instruction source than the developer intended. A jailbreak that makes the model "act as DAN" is spoofing the developer's identity.
OWASP: LLM01. ATLAS: AML.TA0004 Execution
T
Tampering
Traditional: attacker modifies data in transit or at rest
In AI: data poisoning of the training set or RAG retrieval index changes what the model believes to be true. A backdoored model weight is tampering with the model artifact itself.
OWASP: LLM04, LLM08. ATLAS: AML.TA0014 Model Poisoning
R
Repudiation
Traditional: attacker denies having performed an action
In AI: agents that take actions without signed audit trails cannot prove who authorised each action. Agentic systems need cryptographic attestation of tool calls to prevent repudiation of high-value actions.
OWASP: LLM06. ATLAS: AML.TA0004 Execution
I
Information Disclosure
Traditional: attacker reads data they should not see
In AI: training data memorised by the model leaks through outputs. System prompt contents are extracted through creative prompting. Context window contents including other users' data are exposed through injection.
In AI: adversarially crafted prompts consume maximum tokens, driving up compute cost and degrading availability. "Denial of wallet" attacks target the API billing. Model extraction at scale is a DoS on the model provider's revenue.
OWASP: LLM10. ATLAS: AML.TA0011 Impact
E
Elevation of Privilege
Traditional: attacker gains access to higher-privilege functions
In AI: prompt injection redirects an agent to call APIs outside the user's authorised scope. An injected instruction that makes the agent call the refunds API without the user requesting a refund is privilege escalation through the model layer.
OWASP: LLM06. ATLAS: AML.TA0013 Model Evasion
Section 06
Step 4: MITRE ATLAS mapping
After generating your raw threat list from STRIDE, map each threat to a MITRE ATLAS tactic. This does two things. It tells you whether the threat is part of a documented adversary pattern with known detection logic. And it gives you a common vocabulary to use with your security operations team, so your threat model connects directly to what they monitor in the SIEM.
The eight ATLAS tactics most commonly relevant to deployed LLM applications are listed below with the system component they most often target.
AML.TA0002
Initial Access
Gaining the first foothold. For LLM applications, this is often legitimate API access used for extraction campaigns or fraudulent account creation for distillation attacks.
API gateway
AML.TA0003
ML Model Access
Gaining the ability to interact with the model directly. Covers public API access, access to embedding endpoints, and access to the inference infrastructure.
LLM inference endpoint
AML.TA0004
Execution
Running malicious payloads. For LLMs, this is crafted prompts that execute against the model. For agents, it includes triggering unintended tool calls through injected instructions.
Prompt construction, tool gateway
AML.TA0008
Collection
Gathering target data. For LLM applications, this is systematic extraction of training data through output queries or collection of (prompt, response) pairs for distillation.
LLM inference endpoint
AML.TA0010
Exfiltration
Getting collected data out. For LLMs, the inference API itself is the exfiltration channel: training data and context window contents leave through model outputs.
API gateway, model output
AML.TA0012
ML Supply Chain Compromise
Attacking third-party components. Covers compromised pre-trained weights from model registries, poisoned fine-tuning datasets, and malicious plugins or tool integrations.
Model weights, RAG index, tool integrations
AML.TA0013
Model Evasion
Crafting inputs that cause unexpected outputs. Covers jailbreaks for LLMs, indirect injection through retrieved content, and prompts that bypass safety classifiers.
Prompt construction, vector DB retrieval
AML.TA0014
Model Poisoning
Modifying the model or training data to change behaviour. Covers both training data poisoning before training and direct manipulation of the RAG index after deployment.
Training pipeline, vector DB
Section 07
Worked example: LLM with RAG and tools
The system: a customer service chatbot for an e-commerce company. It has access to a knowledge base via RAG (product documentation, return policies). It can call two tools: a CRM read API to look up the customer's order history, and a refunds API to initiate refunds up to a configured limit. Users authenticate with a session token but the model itself does not have per-user authorisation logic.
This is a real deployment pattern. Many production LLM applications have similar components. The threat model below applies steps 1 to 4 to this specific system before producing the threat register in the next section.
Trust boundary analysis for this system:
The user-to-application boundary is the main injection surface. Malicious user input can attempt direct prompt injection. The retrieval-to-context boundary is a RAG poisoning surface: if the knowledge base is ever updated with user-submitted content (such as product reviews or support tickets), those are untrusted documents in the index. The model-to-tool boundary is the highest-risk boundary: the refunds API can cause real financial harm if an injection convinces the model to call it without a legitimate user request. The CRM API is a data exfiltration surface: an injection could cause the model to return other customers' order data.
System components relevant to threat modeling: user session, API gateway with authentication, prompt construction layer assembling system prompt plus retrieved docs plus user query, vector database containing product and policy documentation, LLM inference, tool gateway (the AgentIQ layer), CRM read API, refunds API.
Mirror Security · DiscoveR
Run this exact threat model against your deployment
DiscoveR tests your actual deployed system with the same threat categories identified in this worked example: prompt injection, RAG poisoning, tool abuse, data exfiltration, and system prompt extraction. It fingerprints your system first, then selects the strategies most likely to succeed based on what it finds. Results show which theoretical threats are actually exploitable.
The threat register is the output of the threat modeling process. It takes every threat identified in steps 2 to 4 and scores each one on likelihood and impact. Multiply the two scores to get a priority number. Address the highest-priority threats first. The register below shows the threats identified for the customer service LLM worked example.
Scoring is 1 to 5 for both dimensions. Likelihood: 1 = requires advanced attacker, 5 = any user can trigger with minimal effort. Impact: 1 = inconvenience, 5 = financial harm, regulatory breach, or reputational damage.
STRIDE / OWASP
Threat
Score L x I
Priority
E / LLM01, LLM06
Prompt injection triggers refunds API
User crafts a message that convinces the model to call the refunds API without a legitimate refund request. Financial harm. Maps to ATLAS AML.TA0004 Execution.
25
5 x 5
Critical
T / LLM08
RAG index poisoning redirects model behaviour
Attacker inserts a malicious document into the knowledge base. Retrieved during a legitimate query, it injects instructions into the model's context. Maps to ATLAS AML.TA0014 Model Poisoning.
20
4 x 5
Critical
I / LLM02
CRM data exfiltration via injection
An injection causes the model to return another customer's order data. CRM read access is legitimate but the injection redirects it to the wrong scope. Maps to ATLAS AML.TA0010 Exfiltration.
20
5 x 4
Critical
I / LLM07
System prompt extraction
User extracts the system prompt contents through creative prompting. Reveals guardrail logic and operational configuration. Maps to ATLAS AML.TA0010 Exfiltration.
15
5 x 3
High
I / LLM02
Training data leakage through outputs
Model reproduces PII or confidential data from training set through targeted queries. More likely if the model was fine-tuned on internal customer data. Maps to ATLAS AML.TA0010.
12
3 x 4
High
I / LLM08
Embedding inversion from vector DB
Attacker with direct vector DB access reconstructs document contents from embeddings. Requires infrastructure access, so lower likelihood. Maps to ATLAS AML.TA0010.
8
2 x 4
Medium
Section 09
From threat model to controls
The threat register tells you what to fix and in what order. The top three threats in the worked example are all scored Critical. The controls for each:
Prompt injection triggers refunds API (score 25). The primary control is enforcing deny-by-default tool access at the tool gateway. The refunds API should only be callable when the authenticated user has initiated a refund request through an explicit UI action, not through natural language alone. AgentIQ's policy engine can enforce this: a policy that requires a verified user intent signal before the refunds tool call is permitted.
RAG index poisoning (score 20). Two controls. First, restrict who can add content to the knowledge base index. If user-submitted content is never in the index, the poisoning surface does not exist. Second, validate retrieved documents before adding them to the context by running them through a content classification step. DiscoveR's RAG poisoning category tests whether your current index is vulnerable.
CRM data exfiltration (score 20). AgentIQ's output classification checks whether model outputs contain PII patterns from the CRM before the response reaches the user. AgentID's capability-scoped tokens ensure the CRM read tool is bound to the authenticated user's scope, so the model cannot request a different customer's data regardless of what the injection says.
The pattern is consistent: for each high-priority threat, the control is either a policy at the tool gateway (AgentIQ), a restriction on what data enters the context (data governance + RAG hygiene), or an output check that catches leakage before it leaves the system (AgentIQ output classification).
Section 10
DiscoveR validates the model
A threat model is a hypothesis. It says: we believe these threats are likely and these controls will mitigate them. DiscoveR is how you test whether the hypothesis is correct.
After completing your threat model, the prioritised threat register becomes the input to a DiscoveR scan. The top threats in the register map directly to DiscoveR attack categories: prompt injection maps to the injection and jailbreak categories, RAG poisoning maps to the RAG poisoning category, tool abuse maps to the tool abuse category, CRM exfiltration maps to the data exfiltration category.
The DiscoveR scan result tells you three things. First, which theoretical threats from the model are actually exploitable in your current deployment. Second, whether the controls you implemented since the last scan have reduced your exposure. Third, whether there are exploitable threats your threat model did not identify, which tells you where your model needs updating.
This is the feedback loop: threat model identifies threats, controls are implemented, DiscoveR validates whether the controls work, findings update the threat model. Run this cycle before each major deployment, after significant changes to the system, and on a regular schedule (quarterly is the minimum most teams should aim for in production).
The threat model and the DiscoveR scan are complementary, not redundant. The threat model covers the full attack surface including the training pipeline, supply chain risks, and architectural risks that a runtime test cannot reach. DiscoveR covers the exploitability of the deployed system that the threat model can only theorise about. You need both.
Mirror Security · AgentIQ
Turn your threat model into AgentIQ policies
The highest-priority threats from your threat register map directly to AgentIQ policy types. Tool abuse threats become deny-by-default tool policies. Injection threats become chain-of-thought monitoring policies. Data exfiltration threats become output classification policies. The threat model tells you which policies to deploy. AgentIQ enforces them at 50ms.
Module 05 closes Track 1 with AI governance and compliance. It maps NIST AI RMF, ISO 42001, and the EU AI Act to practical obligations and explains how a threat model like the one in this module produces evidence those frameworks require. After Track 1, choose the path that matches what you are building.
What is AI threat modeling and how does it differ from traditional threat modeling?
AI threat modeling uses the same five-step process as traditional threat modeling: draw the system, identify trust boundaries, enumerate threats, score them, select controls. The difference is that AI systems have three attack surfaces traditional software does not. The model weights are an attack artifact that can be poisoned or extracted. Natural language inputs cannot be sanitised the way code inputs can. The training pipeline is a new attack surface where poisoning changes the model's behaviour permanently. These differences require adding AI-specific threat examples to each STRIDE category and including additional components like the vector database, prompt construction layer, and tool gateway in the data flow diagram.
How do you apply STRIDE to an LLM application?
Apply STRIDE at each trust boundary in your AI system data flow diagram. Spoofing: prompt injection that causes the model to act as a different identity. Tampering: data poisoning of training data or the RAG index. Repudiation: agents taking actions without signed audit trails. Information Disclosure: training data leakage through model outputs, system prompt extraction, context window exposure. Denial of Service: adversarial prompts consuming maximum tokens, denial-of-wallet attacks. Elevation of Privilege: prompt injection that causes an agent to call APIs outside the user's authorised scope. Each maps to OWASP LLM risks and MITRE ATLAS tactics, which are listed in Section 05 of this module.
What is a trust boundary in an AI system?
A trust boundary in an AI system is a line where data moves from one zone of trust to another and must be treated as potentially hostile. The four most important trust boundaries in LLM applications are: user to application (user input is always untrusted), retrieval index to context window (retrieved documents may contain attacker-controlled content if external content is indexed), model output to tool execution (model output driving tool calls must be validated before execution), and agent to external APIs (all external API responses are untrusted). Threats arise at trust boundaries. A threat model that incorrectly identifies trust boundaries will miss the most important attack surfaces.
How does DiscoveR validate an AI threat model?
A threat model is a hypothesis about what will break. DiscoveR tests whether that hypothesis is correct by running structured adversarial probes across the same categories identified in the threat register: prompt injection, RAG poisoning, tool abuse, data exfiltration, jailbreaks. The result tells you which theoretical threats are actually exploitable in your current deployment, whether controls implemented since the last scan have reduced exposure, and whether there are exploitable threats the threat model did not identify. The feedback loop is: threat model identifies threats, controls are implemented, DiscoveR validates whether controls work, findings update the threat model.
Mirror Security · Threat model to controls
Your threat model's top priorities drive AgentIQ policy selection.
Tool abuse threats become deny-by-default policies. Injection threats become chain-of-thought monitoring. DiscoveR validates whether the controls you deployed actually reduce your exposure. One platform, closed loop from threat model to verified defence.