What is prompt injection?

Prompt injection is an attack where hidden or malicious instructions embedded in content override an AI model's intended behavior. It is ranked #1 in the OWASP Top 10 for LLMs (LLM01), appearing in over 73% of production AI deployments tested in 2025. There are two types: direct injection (from the user) and indirect injection (hidden in documents, web pages, or data retrieved by the AI). Defense requires input validation, least-privilege permissions, instruction hierarchy separation, and continuous red teaming. Mirror Academy's AI Agent Security and GenAI Vulnerability Management paths cover prompt injection defense in depth.

What is hallucination in AI?

Hallucination in AI is when a model generates plausible-sounding but factually incorrect or entirely fabricated content. The model has no internal mechanism to verify what it knows versus what it invents. It is a major reliability concern in production AI systems, particularly in legal, medical, and research contexts.

What is a large language model (LLM)?

A large language model (LLM) is a neural network trained on massive amounts of text data to understand and generate human language. Examples include GPT-4, Claude, Gemini, and Llama. LLMs are the core engine behind most AI applications and the primary focus of the OWASP Top 10 for LLM Applications.

What is fine-tuning in AI?

Fine-tuning is adapting a pre-trained foundation model to a specific task using additional training on a smaller, targeted dataset. It is faster and cheaper than training from scratch. Fine-tuning on untrusted data introduces data poisoning risk (OWASP LLM04).

What is data poisoning?

Data poisoning is the manipulation of training data, fine-tuning data, or RAG knowledge bases to alter AI model behavior in subtle or targeted ways. It is OWASP LLM04. Research shows poisoning can be effective with as little as 0.001% corrupted data. It is sometimes incorrectly called model poisoning; the correct OWASP term is data and model poisoning.

What is an embedding in machine learning?

An embedding is a dense numerical vector representing the semantic meaning of text, images, or other data. Similar concepts produce vectors close together in embedding space. Embeddings are the foundation of semantic search and vector databases used in RAG systems, and are covered under OWASP LLM08 (Vector and Embedding Weaknesses).

RLHF (Reinforcement Learning from Human Feedback) is a training method where human raters evaluate model outputs and those preferences are used to fine-tune behavior through reinforcement learning. It is the primary technique used to make LLMs helpful, harmless, and honest.

Agentic AI refers to AI systems capable of multi-step planning, tool use, and independent task execution without requiring human input at each step. Agents can browse the web, write and execute code, send emails, and interact with external services. This substantially increases the attack surface compared to single-turn LLM interactions.

What is a foundation model?

A foundation model is a large AI model trained on broad, diverse data that can be adapted to many downstream tasks through fine-tuning or prompting. Examples include GPT-4, Claude, Llama, and Gemini. Supply chain security risks originate at the foundation model level, covered under OWASP LLM03.

What is in-context learning?

In-context learning is a model's ability to adapt its behavior based on examples provided in the prompt, without any updates to model weights. Also called few-shot prompting when multiple examples are given.

What is the context window of an LLM?

The context window is the maximum number of tokens an LLM can process in a single inference call. It determines how much conversation history, retrieved content, or document text the model can consider at once. Larger context windows increase indirect prompt injection risk in RAG systems.

What is red teaming in AI?

Red teaming in AI security is adversarial testing to identify vulnerabilities in AI systems. It includes prompt injection testing, jailbreak attempts, data extraction probes, and agentic misuse scenarios. AI red teaming is increasingly automated with tools like Mirror Security's DiscoveR.

What is alignment in AI?

Alignment in AI is ensuring AI systems behave according to human values and intentions. Technical alignment means the model does what it is instructed to do. Value alignment means the model's goals reflect broader human interests.

What is a diffusion model?

A diffusion model is a generative AI model that learns to produce data by reversing a process of gradually adding noise. It is the architecture behind most modern AI image generators including Stable Diffusion, DALL-E, and Midjourney.

What is overfitting in machine learning?

Overfitting is when a machine learning model memorizes training data so closely that it performs well on training examples but poorly on new data. It happens when a model is too complex relative to the training data volume.

What is transfer learning?

Transfer learning is applying knowledge learned during training on one task to improve performance on a different task. Pre-training on broad data and then fine-tuning on specific data is the dominant paradigm for modern AI.

AI & Security Glossary: 85 Terms Defined

Q: What is RAG in AI?

RAG (Retrieval-Augmented Generation) is an architecture that combines retrieval from an external knowledge source with language model generation. The model queries a vector database for relevant documents and uses them as context when generating a response. RAG reduces hallucination and grounds answers in real data, but introduces indirect prompt injection risk when the knowledge base contains untrusted content.

Q: What is a transformer model?

A transformer is a neural network architecture using self-attention mechanisms to process input sequences in parallel. Introduced in the 2017 paper Attention Is All You Need, it is the foundation of all modern large language models, vision transformers, and multimodal AI systems.

Q: What is jailbreaking an AI?

Jailbreaking is using adversarial prompts to bypass an AI model's safety constraints and extract outputs the model was trained not to produce. It is a specialized form of prompt injection targeting safety training rather than system instructions. Modern jailbreaking attacks are increasingly automated at scale and tested by tools like DiscoveR from Mirror Security.

A

7 terms

Concept

Accuracy

The proportion of correct predictions out of total predictions. A useful metric for balanced datasets; misleading when class distribution is skewed.

Concept

AGI (Artificial General Intelligence)

Hypothetical AI matching human-level reasoning across all cognitive domains. No current system achieves this; it remains a long-term research goal.

Concept

AI Agent

A system that uses a model plus tools, memory, and decision logic to perform multi-step tasks autonomously. Agents can call APIs, read files, and execute code, which dramatically expands the attack surface.

Covered in depth in the AI Agent Security path.

Concept

Agentic AI

AI capable of multi-step planning, tool use, and independent task execution without human involvement at each step. Substantially increases the risk surface compared to single-turn LLM use.

Defense

Alignment

Ensuring AI systems behave according to human values and intentions. Covers both technical alignment (the model does what it's told) and value alignment (the model's goals reflect human interests).

Concept

ASI (Artificial Superintelligence)

Theoretical AI surpassing human intelligence in all areas. Entirely hypothetical; no current or near-term system approaches this.

Concept

Attention Mechanism

A technique that lets models weigh the importance of different parts of the input when producing each output token. The core innovation behind the Transformer architecture.

Concept

Autoregressive Model

A model that generates output one token at a time, with each token conditioned on all previous tokens. Most LLMs are autoregressive.

B

5 terms

Concept

Backpropagation

An algorithm for computing gradients by propagating errors backward through the network. The mechanism by which neural networks learn from mistakes.

Training

Batch Size

The number of training samples processed before the model updates its parameters. Larger batches give more stable gradient estimates; smaller batches train faster with more noise.

Evaluation

Benchmark

A standardized test for comparing model performance. Common examples: MMLU (knowledge), HumanEval (code), HellaSwag (commonsense reasoning). Benchmarks can be gamed, so treat leaderboard numbers with some skepticism.

Concept

Bias

Systematic errors or unfair outcomes in model outputs due to flawed training data, design choices, or optimization objectives. Includes demographic bias, representation bias, and measurement bias.

Evaluation

BLEU Score

A metric for evaluating the quality of generated text against reference translations or outputs. Measures n-gram overlap. Widely used in NLP but criticized for not capturing semantic quality.

C

4 terms

Concept

Chain-of-Thought (CoT)

A prompting technique that asks the model to reason step by step before giving a final answer. Significantly improves performance on multi-step reasoning tasks.

Concept

Computer Vision

AI that interprets and analyzes visual information from images and video. Encompasses object detection, image segmentation, OCR, and video understanding.

Defense

Constitutional AI

A training approach developed by Anthropic that uses a set of principles to guide model behavior. The model critiques and revises its own outputs against the constitution during training.

Concept

Context Window

The maximum number of tokens a model can process in a single inference. Determines how much text, conversation history, or retrieved content a model can consider at once. Larger windows increase indirect injection risk in RAG systems.

D

6 terms

Attack

Data Leakage

The unintended exposure of sensitive or proprietary information through model outputs, logs, or debugging traces. OWASP LLM02. Can occur even when underlying infrastructure is secure.

OWASP LLM02. Covered in GenAI Vulnerability Management.

Attack

Data Poisoning

Manipulation of training or fine-tuning data to alter model behavior in subtle or targeted ways. Also applies to RAG knowledge bases. OWASP LLM04. Previously mislabeled "model poisoning."

OWASP LLM04. Effective with as little as 0.001% corrupted data.

Defense

Data Privacy

Protecting personal information used in AI training and inference. Covers consent, data minimization, access control, and compliance with GDPR, HIPAA, and similar regulations.

Concept

Deep Learning (DL)

A branch of machine learning using multi-layered neural networks to learn hierarchical representations from large datasets. The foundation of modern LLMs, vision models, and speech systems.

Concept

Diffusion Model

A generative model that learns to reverse a noise-adding process to create data. The architecture behind most modern image generators (Stable Diffusion, DALL-E, Midjourney).

Infrastructure

Distillation

Training a smaller model to mimic the behavior of a larger one, preserving much of the performance at a fraction of the compute cost. Widely used to create deployable, efficient models.

E

5 terms

Infrastructure

Edge AI

Running AI models locally on devices (phones, sensors, edge servers) rather than cloud servers. Reduces latency and data exposure, but limits model size and capability.

Concept

Embedding

A dense vector representation capturing the semantic meaning of text, images, or other data. The basis of semantic search and vector databases. Embeddings from sensitive content can leak information if the vector store is compromised.

OWASP LLM08. Covered in Vector DB & RAG Security.

Concept

Emergent Abilities

Capabilities that appear in large models but are not present in smaller versions. Examples: multi-step reasoning, code generation, in-context learning. Their unpredictability is a safety and security concern.

Training

Epoch

One complete pass through the entire training dataset. Training typically runs for multiple epochs; too many can lead to overfitting.

Defense

Explainable AI (XAI)

Methods and techniques that make AI decision-making interpretable to humans. Important for compliance, auditing, and identifying when a model is behaving unexpectedly.

F

4 terms

Evaluation

F1 Score

The harmonic mean of precision and recall. A balanced metric useful when false positives and false negatives both matter, such as in threat detection systems.

Concept

Few-Shot Learning

Learning from a very small number of examples (typically 1 to 10), provided directly in the prompt. Avoids full retraining while steering model behavior.

Concept

Fine-Tuning

Adapting a pre-trained foundation model to a specific task using additional training on targeted data. Introduces risk of training data poisoning if fine-tuning data is untrusted.

Concept

Foundation Model

A large model trained on broad, diverse data that can be adapted to many downstream tasks. GPT-4, Claude, Llama, and Gemini are all foundation models. Supply chain risks originate here.

OWASP LLM03. Unvetted foundation models are a supply chain risk.

G

5 terms

Concept

Generative AI (GenAI)

AI systems that produce new content: text, images, code, audio, or video, based on patterns learned from training data. Includes LLMs, diffusion models, and multimodal systems. The primary focus of OWASP Top 10 for LLMs.

Concept

Gradient Descent

An optimization algorithm that iteratively adjusts model parameters in the direction that reduces the loss function. The core mechanism of neural network training.

Evaluation

Ground Truth

A verified correct answer or label used for training or evaluation. The quality of ground truth data directly determines the quality of a trained model.

Infrastructure

GPU (Graphics Processing Unit)

Hardware that accelerates the parallel computations required for AI training and inference. NVIDIA GPUs dominate the market; availability and cost are major constraints in AI deployment.

Defense

Guardrails

Policies, filters, checks, and controls that guide or restrict what an AI system can see, decide, or output. Can be input filters, output classifiers, tool-use restrictions, or behavioral policies.

AgentIQ provides runtime guardrails for AI agents.

H

2 terms

Concept

Hallucination

A model generating plausible but factually incorrect or fabricated content. A major reliability concern in production AI systems, particularly those used for research, legal, or medical decisions.

Training

Hyperparameters

Configuration settings fixed before training begins, such as learning rate, batch size, number of layers, and dropout rate. Tuning them correctly is often the difference between a useful model and a poor one.

I

3 terms

Concept

Image Segmentation

Partitioning an image into distinct regions or objects. Used in medical imaging, autonomous vehicles, and surveillance systems.

Concept

In-Context Learning

A model adapting its behavior based on examples provided in the prompt, without any weight updates. Also called few-shot prompting when multiple examples are given.

Concept

Inference

Using a trained model to generate predictions on new inputs. Inference endpoints are a key attack surface: unsecured endpoints can leak model outputs or be abused for excessive consumption (OWASP LLM10).

VectaX enables encrypted inference using FHE.

J

1 term

Attack

Jailbreaking

A technique to bypass a model's safety constraints and extract restricted outputs. A specialized form of prompt injection targeting safety training rather than system instructions. Increasingly automated at scale.

DiscoveR tests AI systems for jailbreak vulnerabilities automatically.

L

4 terms

Concept

Large Language Model (LLM)

A neural network trained on massive text data to understand and generate language. Examples: GPT-4, Claude, Gemini, Llama. The core engine behind most AI applications and the primary focus of OWASP Top 10 for LLMs.

Infrastructure

Latency

The time delay between sending a request and receiving a model response. Critical in real-time applications; FHE-encrypted inference introduces additional latency that must be managed.

Concept

Latent Space

A compressed internal representation space where similar concepts cluster together. Understanding latent space is key to interpreting why a model produces certain outputs.

Concept

Loss Function

A function that measures the gap between a model's predictions and the correct outputs. Guides optimization during training. The choice of loss function shapes what behavior a model learns to optimize for.

M

5 terms

Concept

Machine Learning (ML)

An AI subset where systems learn patterns from data rather than following explicitly programmed rules. The broader field containing deep learning, reinforcement learning, and classical statistical approaches.

Concept

MCP (Model Context Protocol)

An open standard for connecting AI models to external tools and data sources. Widely adopted for agentic systems. MCP servers expand what agents can access, which increases the need for strict tool-use permissioning.

Covered in the AI Agent Security path.

Concept

Mixture of Experts (MoE)

An architecture where different sub-networks (experts) specialize in different types of inputs, with a routing mechanism selecting which experts to activate. Used in GPT-4, Mixtral, and others to scale efficiently.

Defense

Model Card

Documentation describing a model's intended use, capabilities, limitations, evaluation results, and known risks. An important governance artifact for AI transparency and compliance.

Concept

Multimodal AI

Models that process and reason across multiple data types (text, image, audio, video). Expands the attack surface to include image-based prompt injection and cross-modal data leakage.

N

2 terms

Concept

Natural Language Processing (NLP)

The field of AI focused on understanding, interpreting, and generating human language. Encompasses text classification, sentiment analysis, machine translation, summarization, and modern LLMs.

Concept

Neural Network

A computational model using interconnected nodes (neurons) organized in layers to process information. The fundamental building block of deep learning and all modern AI systems.

O

3 terms

Concept

Object Detection

Identifying and locating objects within images. Used in security cameras, autonomous vehicles, medical diagnostics, and manufacturing quality control.

Concept

OCR (Optical Character Recognition)

Converting images of text into machine-readable text. Used to pre-process documents for AI systems. Malicious text in images can be an indirect prompt injection vector in multimodal systems.

Concept

Overfitting

A model memorizing training data patterns so closely that it fails to generalize to new inputs. Results in high training accuracy but poor real-world performance.

P

5 terms

Concept

Parameters

Learnable weights in a model that adjust during training. Model size is commonly described in parameter count (e.g., 7B, 70B). More parameters generally means greater capability and greater resource requirements.

Evaluation

Perplexity

A metric measuring how well a language model predicts a text sample. Lower perplexity means better prediction. Used in language model evaluation, though not always correlated with downstream task quality.

Evaluation

Precision

Of all the positive predictions a model makes, the proportion that are actually correct. High precision means few false positives. Important in threat detection where false alerts have real costs.

Concept

Prompt

The input or instructions given to an AI model to guide its output. Includes system prompts (set by the developer), user queries, and retrieved context. The primary attack surface for LLM-based systems.

Attack

Prompt Injection

An attack where hidden or malicious instructions override the model's intended behavior. Ranked #1 in OWASP Top 10 for LLMs. Can be direct (from the user) or indirect (hidden in retrieved documents, web pages, or tool outputs).

OWASP LLM01. Present in 73%+ of production AI systems tested in 2025.

Q

1 term

Infrastructure

Quantization

Reducing model numerical precision (e.g., from 32-bit to 8-bit or 4-bit) to decrease model size and speed up inference. Enables running larger models on limited hardware with modest accuracy trade-offs.

R

5 terms

Concept

RAG (Retrieval-Augmented Generation)

An architecture combining retrieval from external knowledge with generation for grounded outputs. The model queries a vector database for relevant documents and uses them as context. Widely used and widely vulnerable to indirect prompt injection via poisoned knowledge bases.

Covered in depth in the Vector DB & RAG Security path.

Evaluation

Recall

Of all the actual positive cases, the proportion correctly identified by the model. High recall means few false negatives. In security contexts, low recall means threats are missed.

Defense

Red Teaming

Adversarial testing to identify vulnerabilities and failure modes in AI systems. Includes prompt injection, jailbreak attempts, data extraction tests, and agentic misuse scenarios. Increasingly automated with tools like DiscoveR.

DiscoveR automates AI red teaming at scale.

Concept

Reinforcement Learning (RL)

Learning through trial and error, with a model optimizing actions based on reward signals. Used in game-playing AI, robotics, and as the foundation for RLHF in LLM alignment.

Concept

RLHF (Reinforcement Learning from Human Feedback)

A training method that uses human preference ratings to fine-tune model behavior. The primary technique used to make LLMs helpful, harmless, and honest. Susceptible to reward hacking if the human feedback process is gamed.

S

5 terms

Concept

Scaling Laws

Predictable mathematical relationships between model size, training data volume, compute budget, and model performance. Provide a basis for forecasting model capabilities before training completes.

Concept

Self-Supervised Learning

A method where the model generates its own training labels from input data (e.g., predicting the next word, or the masked word). The technique behind pre-training most modern LLMs.

Concept

Semantic Search

Search based on meaning and intent rather than keyword matching. Powered by embeddings. The foundation of RAG pipelines and vector database retrieval.

Concept

Supervised Learning

Training on labeled data where each input is paired with a correct output. The most common machine learning paradigm for classification and regression tasks.

Concept

Synthetic Data

Artificially generated data used for training, augmentation, or evaluation. Useful when real data is scarce or sensitive. Synthetic data quality directly affects model behavior if used in fine-tuning.

T

8 terms

Concept

Temperature

A parameter controlling output randomness. Higher values produce more varied, creative outputs; lower values produce more focused, deterministic ones. Relevant to security: high temperature can increase the chance of unexpected or policy-violating outputs.

Infrastructure

Throughput

The number of requests a system can handle per unit of time. An important capacity planning metric for production AI deployments. Unbounded consumption attacks (OWASP LLM10) target throughput limits.

Concept

Token

The basic unit of text that a model processes. A token is roughly a word or subword. Token counts determine cost, context window usage, and inference speed. Tokens are also the granularity at which injection attacks operate.

Concept

Tokenization

Splitting input text into tokens for model processing. Different tokenizers handle the same text differently, which can be exploited to bypass input filters that operate on raw text rather than tokens.

Infrastructure

Tool Use

An AI's ability to invoke external functions, APIs, databases, or services to complete tasks. A major capability extension for agents and a key attack surface: tools with broad permissions are a high-value target for prompt injection.

OWASP LLM06. Covered in the AI Agent Security path.

Infrastructure

TPU (Tensor Processing Unit)

Google's custom AI accelerator chips, designed specifically for matrix operations in neural network training and inference. Offer performance advantages over GPUs for certain workloads.

Concept

Training Data

The dataset used to teach a model patterns and relationships. The quality, composition, and provenance of training data directly determines model behavior. A primary attack surface through data poisoning.

OWASP LLM04. Poisoning can be effective with under 0.001% corrupted data.

Concept

Transfer Learning

Applying knowledge learned in one task or domain to improve performance in another. Pre-training on broad data then fine-tuning on specific data is the dominant paradigm for modern AI development.

Concept

Transformer

An architecture using self-attention mechanisms to process input sequences in parallel. Introduced in the 2017 paper "Attention Is All You Need." The foundation of all modern LLMs, vision transformers, and multimodal models.

U

2 terms

Concept

Underfitting

A model too simple to capture the underlying patterns in training data. Results in poor performance on both training and new data. The opposite problem to overfitting.

Concept

Unsupervised Learning

Training on unlabeled data to discover hidden patterns, structures, or groupings. Used in clustering, anomaly detection, and dimensionality reduction.

Z

1 term

Concept

Zero-Shot Learning

Performing tasks without any task-specific training examples, relying entirely on the model's general knowledge. A key feature of modern foundation models and the basis for most out-of-the-box LLM deployments.