What is a trust boundary in an AI system and why does it matter for threat modeling?

A trust boundary in an AI system is a line between two zones where the level of trust changes. Data flowing across a trust boundary must be treated as potentially hostile. In a RAG system, trust boundaries include: between the user and the application (user input is untrusted), between the application and the retrieval index (documents in the index may be partially untrusted if external content is indexed), between the model and the tool layer (model output that drives tool calls must be validated before execution), and between the agent and external APIs it calls (all external API responses are untrusted). Threats arise at trust boundaries because that is where data moves between different trust zones. A threat model that does not correctly identify trust boundaries will miss the most important attack surfaces.

AI Threat Modeling: STRIDE for AI, MITRE ATLAS Applied, Practical Workshop | Track 1

Q: What is AI threat modeling and how does it differ from traditional threat modeling?

AI threat modeling is the process of systematically identifying what can go wrong in an AI system by mapping the system's components, data flows, and trust boundaries and then applying known attack patterns to find likely threats. Traditional threat modeling uses the same process but was designed for software systems with deterministic inputs and outputs. AI threat modeling adds three new elements. First, the model itself is an attack artifact: its weights can be poisoned, extracted, or backdoored. Second, natural language inputs cannot be sanitised the way code inputs can. Third, the training pipeline is a new attack surface with no equivalent in traditional software. The core STRIDE methodology adapts well, but each STRIDE category needs AI-specific threat examples to be useful.

Q: How do you apply STRIDE to an LLM application?

STRIDE stands for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege. Applied to LLMs: Spoofing covers prompt injection that causes the model to act as a different identity or role than intended. Tampering covers data poisoning of training data or the RAG retrieval index. Repudiation covers the lack of signed or verifiable audit trails for model actions, especially in agentic systems. Information Disclosure covers sensitive information leaking through model outputs from training data or context window. Denial of Service covers unbounded resource consumption through adversarial prompts or API flooding. Elevation of Privilege covers prompt injection that causes an agent to take actions beyond the user's authorised scope. Each of these maps to OWASP LLM risks and MITRE ATLAS tactics.

Section 01

Why AI threat modeling differs

Traditional threat modeling, whether you use STRIDE, PASTA, or LINDDUN, was designed for software systems with deterministic inputs and outputs. You map data flows, identify trust boundaries, enumerate threats at each boundary, and rank them. The methodology works. But AI systems have three characteristics that traditional threat modeling does not address well.

The model itself is an attack artifact. In traditional software, the code is the artifact and it can be patched when a vulnerability is found. In AI systems, the model weights are also an artifact. Weights can be poisoned during training, backdoored invisibly, or extracted through systematic API queries. There is no "patch" equivalent for a compromised model: you retrain, which costs time and money.

Natural language inputs cannot be fully sanitised. SQL injection is fixed by parameterising queries. The fix is complete and the threat class is closed. Prompt injection has no equivalent fix because the model must read natural language to function and cannot reliably separate instructions from data in that natural language. Every improvement is a partial control, not a closure.

The training pipeline is a new attack surface. Traditional software has a build pipeline. If an attacker can compromise the build, that is serious but well-understood. AI systems have a training pipeline where the data that flows through it shapes the model's behaviour forever. Poisoning training data changes the model's behaviour in ways that can be invisible at evaluation time and only trigger on specific inputs.

These three differences do not break the STRIDE methodology. They require adding AI-specific threat examples to each STRIDE category and adding new components and trust boundaries that traditional data flow diagrams do not include.

You already know the vocabulary. Modules 01 to 03 gave you the AI attack surface, the six attack categories, MITRE ATLAS, and OWASP Top 10 for LLMs. This module shows you how to organise all of that into a structured threat model for a specific system you are building or defending.

Section 02

The five-step process

AI threat modeling follows the same five steps as traditional threat modeling, but each step has AI-specific content. The process is designed to produce one output: a prioritised list of threats with associated controls, so you know what to build first.

1

Draw the system

Map every component that processes, stores, or transmits data. For AI systems this includes: the model inference endpoint, the vector database and embedding pipeline (if RAG), the prompt construction layer, the tool layer (every API the model can call), and the memory or context store. Do not omit components because they seem internal or safe.

Output: data flow diagram with all AI components

2

Identify trust boundaries

Draw a line wherever the trust level changes. User input is always untrusted. Retrieved documents from a RAG index are partially trusted if external content is indexed. Model output that drives tool calls must be treated as potentially hostile. External API responses are untrusted. Each trust boundary is a candidate attack surface.

Output: annotated diagram with trust boundaries marked

3

Apply STRIDE at each trust boundary

For each trust boundary, ask all six STRIDE questions with AI-specific examples. Most threats in AI systems appear at the user-to-model boundary (prompt injection), the retrieval-to-context boundary (RAG poisoning), and the model-to-tool boundary (tool abuse via injection).

Output: raw threat list per boundary

4

Map threats to MITRE ATLAS tactics

For each threat, identify the MITRE ATLAS tactic it maps to. This connects your threat list to the broader security community vocabulary, makes it easier to find relevant detection logic, and lets you identify which MITRE ATT&CK techniques in your existing SIEM might overlap.

Output: threat list with ATLAS tactic codes

5

Score and prioritise

Score each threat on likelihood and impact. Multiply to get a priority score. Likelihood factors: does your system have the exposed surface? Is this attack technique actively used in the wild? Impact factors: what is the worst case if this succeeds? Customer data? Regulatory breach? Revenue loss? Prioritise the top five threats for immediate control selection.

Output: prioritised threat register with control recommendations

Section 03

Step 1: draw the system

A data flow diagram for an AI system has more components than a traditional web application. The diagram below shows the components for a typical LLM application with RAG and tool access. Every component is a potential attack surface. Every arrow is a data flow that crosses a trust boundary or stays within one.

The components most commonly left off AI data flow diagrams are the prompt construction layer (where the system prompt and retrieved context are assembled before the model sees them) and the model-to-tool gateway (where model output is translated into actual API calls). Both are critical for threat modeling because both are points where an injection attack can pivot from causing the model to say wrong things to causing the model to do wrong things.

AI system data flow diagram: LLM with RAG and tool access

Untrusted zone

User browser / client

Natural language input. No validation possible at this layer. Everything here is untrusted by definition.

▼ Trust boundary: user to application ▼

Application zone

API gateway + auth

Authentication, rate limiting, input logging

Application zone

Prompt construction

System prompt + retrieved context + user query assembled here. Key injection surface.

Application zone

Vector DB retrieval

Embedding search. Retrieved docs may contain attacker-controlled content.

▼ Trust boundary: application to model ▼

Model zone (inference)

LLM inference endpoint

The model. Output is plausible text but not guaranteed safe. Output that drives tool calls must be validated before execution.

▼ Trust boundary: model output to tool execution ▼

Tool zone

Tool gateway (AgentIQ)

Validates tool calls against policy before execution. Deny-by-default.

External (untrusted)

CRM read API

Read-only customer data. External API responses are untrusted.

External (untrusted)

Refunds API

High-privilege write action. Requires explicit authorisation per call.

The training pipeline is not shown above because it operates before deployment. For a complete threat model, draw a second diagram for the training pipeline: where training data comes from, who can write to it, how the model is trained and evaluated, and how the trained weights are transferred to production. Data poisoning threats live in that diagram.

Section 04

Step 2: trust boundaries

A trust boundary is a line where data moves from one zone of trust to another. Threats arise at trust boundaries because that is where data passes between different owners, different trust levels, and different validation rules. If you miss a trust boundary, you miss the threats that cross it.

AI systems have trust boundaries that traditional web applications do not have. The four most important ones:

User to application. User input is always untrusted. This is the same as in traditional web applications. The difference is what "malicious input" looks like: in traditional apps, you scan for SQL fragments, HTML tags, and shell metacharacters. In AI apps, malicious input is natural language that redirects the model's behaviour. There is no equivalent of a regex that catches all prompt injection.

Retrieval to context window. Documents retrieved from a vector database and placed into the model's context are only as trusted as the source that added them to the index. If the index includes content from user uploads, web crawls, or third-party sources, that content is partially untrusted. An attacker who can add a document to the index can inject instructions into the model's context without the user sending anything malicious.

Model output to tool execution. The model's text output is plausible but not guaranteed to be safe to execute. When model output is parsed as a tool call and executed against real APIs, that execution crosses a critical trust boundary. The model's reasoning may have been compromised by an injection in the context. Validating tool calls against an explicit policy before execution is the primary defence here.

Agent to external APIs. All external API responses must be treated as untrusted. An API that an agent calls might be compromised or might return content that contains further injection instructions. An agent that processes API responses without validation is exposed to indirect injection from the API layer.

Section 05

Step 3: STRIDE for AI

STRIDE is a threat taxonomy, not a methodology. It gives you six categories to ask about for each data flow. Applied to AI systems, each category has specific AI threat examples that traditional STRIDE lists do not include.

S

Spoofing

Traditional: attacker claims to be a different user or system

In AI: prompt injection causes the model to act as a different persona, role, or instruction source than the developer intended. A jailbreak that makes the model "act as DAN" is spoofing the developer's identity.

OWASP: LLM01. ATLAS: AML.TA0004 Execution

T

Tampering

Traditional: attacker modifies data in transit or at rest

In AI: data poisoning of the training set or RAG retrieval index changes what the model believes to be true. A backdoored model weight is tampering with the model artifact itself.

OWASP: LLM04, LLM08. ATLAS: AML.TA0014 Model Poisoning

R

Repudiation

Traditional: attacker denies having performed an action

In AI: agents that take actions without signed audit trails cannot prove who authorised each action. Agentic systems need cryptographic attestation of tool calls to prevent repudiation of high-value actions.

OWASP: LLM06. ATLAS: AML.TA0004 Execution

I

Information Disclosure

Traditional: attacker reads data they should not see

In AI: training data memorised by the model leaks through outputs. System prompt contents are extracted through creative prompting. Context window contents including other users' data are exposed through injection.

OWASP: LLM02, LLM07. ATLAS: AML.TA0010 Exfiltration

D

Denial of Service

Traditional: attacker makes a service unavailable

In AI: adversarially crafted prompts consume maximum tokens, driving up compute cost and degrading availability. "Denial of wallet" attacks target the API billing. Model extraction at scale is a DoS on the model provider's revenue.

OWASP: LLM10. ATLAS: AML.TA0011 Impact

E

Elevation of Privilege

Traditional: attacker gains access to higher-privilege functions

In AI: prompt injection redirects an agent to call APIs outside the user's authorised scope. An injected instruction that makes the agent call the refunds API without the user requesting a refund is privilege escalation through the model layer.

OWASP: LLM06. ATLAS: AML.TA0013 Model Evasion

Section 06

Step 4: MITRE ATLAS mapping

After generating your raw threat list from STRIDE, map each threat to a MITRE ATLAS tactic. This does two things. It tells you whether the threat is part of a documented adversary pattern with known detection logic. And it gives you a common vocabulary to use with your security operations team, so your threat model connects directly to what they monitor in the SIEM.

The eight ATLAS tactics most commonly relevant to deployed LLM applications are listed below with the system component they most often target.

AML.TA0002

Initial Access

Gaining the first foothold. For LLM applications, this is often legitimate API access used for extraction campaigns or fraudulent account creation for distillation attacks.

API gateway

AML.TA0003

ML Model Access

Gaining the ability to interact with the model directly. Covers public API access, access to embedding endpoints, and access to the inference infrastructure.

LLM inference endpoint

AML.TA0004

Execution

Running malicious payloads. For LLMs, this is crafted prompts that execute against the model. For agents, it includes triggering unintended tool calls through injected instructions.

Prompt construction, tool gateway

AML.TA0008

Collection

Gathering target data. For LLM applications, this is systematic extraction of training data through output queries or collection of (prompt, response) pairs for distillation.

LLM inference endpoint

AML.TA0010

Exfiltration

Getting collected data out. For LLMs, the inference API itself is the exfiltration channel: training data and context window contents leave through model outputs.

API gateway, model output

AML.TA0012

ML Supply Chain Compromise

Attacking third-party components. Covers compromised pre-trained weights from model registries, poisoned fine-tuning datasets, and malicious plugins or tool integrations.

Model weights, RAG index, tool integrations

AML.TA0013

Model Evasion

Crafting inputs that cause unexpected outputs. Covers jailbreaks for LLMs, indirect injection through retrieved content, and prompts that bypass safety classifiers.

Prompt construction, vector DB retrieval

AML.TA0014

Model Poisoning

Modifying the model or training data to change behaviour. Covers both training data poisoning before training and direct manipulation of the RAG index after deployment.

Training pipeline, vector DB

Section 07

Worked example: LLM with RAG and tools

The system: a customer service chatbot for an e-commerce company. It has access to a knowledge base via RAG (product documentation, return policies). It can call two tools: a CRM read API to look up the customer's order history, and a refunds API to initiate refunds up to a configured limit. Users authenticate with a session token but the model itself does not have per-user authorisation logic.

This is a real deployment pattern. Many production LLM applications have similar components. The threat model below applies steps 1 to 4 to this specific system before producing the threat register in the next section.

Trust boundary analysis for this system:

The user-to-application boundary is the main injection surface. Malicious user input can attempt direct prompt injection. The retrieval-to-context boundary is a RAG poisoning surface: if the knowledge base is ever updated with user-submitted content (such as product reviews or support tickets), those are untrusted documents in the index. The model-to-tool boundary is the highest-risk boundary: the refunds API can cause real financial harm if an injection convinces the model to call it without a legitimate user request. The CRM API is a data exfiltration surface: an injection could cause the model to return other customers' order data.

System components relevant to threat modeling: user session, API gateway with authentication, prompt construction layer assembling system prompt plus retrieved docs plus user query, vector database containing product and policy documentation, LLM inference, tool gateway (the AgentIQ layer), CRM read API, refunds API.

Section 08

Step 5: threat register and scoring

The threat register is the output of the threat modeling process. It takes every threat identified in steps 2 to 4 and scores each one on likelihood and impact. Multiply the two scores to get a priority number. Address the highest-priority threats first. The register below shows the threats identified for the customer service LLM worked example.

Scoring is 1 to 5 for both dimensions. Likelihood: 1 = requires advanced attacker, 5 = any user can trigger with minimal effort. Impact: 1 = inconvenience, 5 = financial harm, regulatory breach, or reputational damage.

STRIDE / OWASP	Threat	Score L x I	Priority
E / LLM01, LLM06	Prompt injection triggers refunds API User crafts a message that convinces the model to call the refunds API without a legitimate refund request. Financial harm. Maps to ATLAS AML.TA0004 Execution.	25 5 x 5	Critical
T / LLM08	RAG index poisoning redirects model behaviour Attacker inserts a malicious document into the knowledge base. Retrieved during a legitimate query, it injects instructions into the model's context. Maps to ATLAS AML.TA0014 Model Poisoning.	20 4 x 5	Critical
I / LLM02	CRM data exfiltration via injection An injection causes the model to return another customer's order data. CRM read access is legitimate but the injection redirects it to the wrong scope. Maps to ATLAS AML.TA0010 Exfiltration.	20 5 x 4	Critical
I / LLM07	System prompt extraction User extracts the system prompt contents through creative prompting. Reveals guardrail logic and operational configuration. Maps to ATLAS AML.TA0010 Exfiltration.	15 5 x 3	High
I / LLM02	Training data leakage through outputs Model reproduces PII or confidential data from training set through targeted queries. More likely if the model was fine-tuned on internal customer data. Maps to ATLAS AML.TA0010.	12 3 x 4	High
I / LLM08	Embedding inversion from vector DB Attacker with direct vector DB access reconstructs document contents from embeddings. Requires infrastructure access, so lower likelihood. Maps to ATLAS AML.TA0010.	8 2 x 4	Medium

Section 09

From threat model to controls

The threat register tells you what to fix and in what order. The top three threats in the worked example are all scored Critical. The controls for each:

Prompt injection triggers refunds API (score 25). The primary control is enforcing deny-by-default tool access at the tool gateway. The refunds API should only be callable when the authenticated user has initiated a refund request through an explicit UI action, not through natural language alone. AgentIQ's policy engine can enforce this: a policy that requires a verified user intent signal before the refunds tool call is permitted.

RAG index poisoning (score 20). Two controls. First, restrict who can add content to the knowledge base index. If user-submitted content is never in the index, the poisoning surface does not exist. Second, validate retrieved documents before adding them to the context by running them through a content classification step. DiscoveR's RAG poisoning category tests whether your current index is vulnerable.

CRM data exfiltration (score 20). AgentIQ's output classification checks whether model outputs contain PII patterns from the CRM before the response reaches the user. AgentID's capability-scoped tokens ensure the CRM read tool is bound to the authenticated user's scope, so the model cannot request a different customer's data regardless of what the injection says.

The pattern is consistent: for each high-priority threat, the control is either a policy at the tool gateway (AgentIQ), a restriction on what data enters the context (data governance + RAG hygiene), or an output check that catches leakage before it leaves the system (AgentIQ output classification).

Section 10

DiscoveR validates the model

A threat model is a hypothesis. It says: we believe these threats are likely and these controls will mitigate them. DiscoveR is how you test whether the hypothesis is correct.

After completing your threat model, the prioritised threat register becomes the input to a DiscoveR scan. The top threats in the register map directly to DiscoveR attack categories: prompt injection maps to the injection and jailbreak categories, RAG poisoning maps to the RAG poisoning category, tool abuse maps to the tool abuse category, CRM exfiltration maps to the data exfiltration category.

The DiscoveR scan result tells you three things. First, which theoretical threats from the model are actually exploitable in your current deployment. Second, whether the controls you implemented since the last scan have reduced your exposure. Third, whether there are exploitable threats your threat model did not identify, which tells you where your model needs updating.

This is the feedback loop: threat model identifies threats, controls are implemented, DiscoveR validates whether the controls work, findings update the threat model. Run this cycle before each major deployment, after significant changes to the system, and on a regular schedule (quarterly is the minimum most teams should aim for in production).

The threat model and the DiscoveR scan are complementary, not redundant. The threat model covers the full attack surface including the training pipeline, supply chain risks, and architectural risks that a runtime test cannot reach. DiscoveR covers the exploitability of the deployed system that the threat model can only theorise about. You need both.

Section 11

What to study next

Module 05 closes Track 1 with AI governance and compliance. It maps NIST AI RMF, ISO 42001, and the EU AI Act to practical obligations and explains how a threat model like the one in this module produces evidence those frameworks require. After Track 1, choose the path that matches what you are building.

Section 12

Frequently asked questions

What is AI threat modeling and how does it differ from traditional threat modeling?

AI threat modeling uses the same five-step process as traditional threat modeling: draw the system, identify trust boundaries, enumerate threats, score them, select controls. The difference is that AI systems have three attack surfaces traditional software does not. The model weights are an attack artifact that can be poisoned or extracted. Natural language inputs cannot be sanitised the way code inputs can. The training pipeline is a new attack surface where poisoning changes the model's behaviour permanently. These differences require adding AI-specific threat examples to each STRIDE category and including additional components like the vector database, prompt construction layer, and tool gateway in the data flow diagram.

How do you apply STRIDE to an LLM application?

Apply STRIDE at each trust boundary in your AI system data flow diagram. Spoofing: prompt injection that causes the model to act as a different identity. Tampering: data poisoning of training data or the RAG index. Repudiation: agents taking actions without signed audit trails. Information Disclosure: training data leakage through model outputs, system prompt extraction, context window exposure. Denial of Service: adversarial prompts consuming maximum tokens, denial-of-wallet attacks. Elevation of Privilege: prompt injection that causes an agent to call APIs outside the user's authorised scope. Each maps to OWASP LLM risks and MITRE ATLAS tactics, which are listed in Section 05 of this module.

What is a trust boundary in an AI system?

A trust boundary in an AI system is a line where data moves from one zone of trust to another and must be treated as potentially hostile. The four most important trust boundaries in LLM applications are: user to application (user input is always untrusted), retrieval index to context window (retrieved documents may contain attacker-controlled content if external content is indexed), model output to tool execution (model output driving tool calls must be validated before execution), and agent to external APIs (all external API responses are untrusted). Threats arise at trust boundaries. A threat model that incorrectly identifies trust boundaries will miss the most important attack surfaces.

How does DiscoveR validate an AI threat model?

A threat model is a hypothesis about what will break. DiscoveR tests whether that hypothesis is correct by running structured adversarial probes across the same categories identified in the threat register: prompt injection, RAG poisoning, tool abuse, data exfiltration, jailbreaks. The result tells you which theoretical threats are actually exploitable in your current deployment, whether controls implemented since the last scan have reduced exposure, and whether there are exploitable threats the threat model did not identify. The feedback loop is: threat model identifies threats, controls are implemented, DiscoveR validates whether controls work, findings update the threat model.

AI Threat Modeling

Why AI threat modeling differs

The five-step process

Step 1: draw the system

Step 2: trust boundaries

Step 3: STRIDE for AI

Step 4: MITRE ATLAS mapping

Worked example: LLM with RAG and tools

Run this exact threat model against your deployment

Step 5: threat register and scoring

From threat model to controls

DiscoveR validates the model

Turn your threat model into AgentIQ policies

What to study next

Frequently asked questions

Your threat model's top priorities drive AgentIQ policy selection.