The proportion of correct predictions out of total predictions. A useful metric for balanced datasets; misleading when class distribution is skewed.
Hypothetical AI matching human-level reasoning across all cognitive domains. No current system achieves this; it remains a long-term research goal.
A system that uses a model plus tools, memory, and decision logic to perform multi-step tasks autonomously. Agents can call APIs, read files, and execute code, which dramatically expands the attack surface.
AI capable of multi-step planning, tool use, and independent task execution without human involvement at each step. Substantially increases the risk surface compared to single-turn LLM use.
Ensuring AI systems behave according to human values and intentions. Covers both technical alignment (the model does what it's told) and value alignment (the model's goals reflect human interests).
Theoretical AI surpassing human intelligence in all areas. Entirely hypothetical; no current or near-term system approaches this.
A technique that lets models weigh the importance of different parts of the input when producing each output token. The core innovation behind the Transformer architecture.
A model that generates output one token at a time, with each token conditioned on all previous tokens. Most LLMs are autoregressive.
An algorithm for computing gradients by propagating errors backward through the network. The mechanism by which neural networks learn from mistakes.
The number of training samples processed before the model updates its parameters. Larger batches give more stable gradient estimates; smaller batches train faster with more noise.
A standardized test for comparing model performance. Common examples: MMLU (knowledge), HumanEval (code), HellaSwag (commonsense reasoning). Benchmarks can be gamed, so treat leaderboard numbers with some skepticism.
Systematic errors or unfair outcomes in model outputs due to flawed training data, design choices, or optimization objectives. Includes demographic bias, representation bias, and measurement bias.
A metric for evaluating the quality of generated text against reference translations or outputs. Measures n-gram overlap. Widely used in NLP but criticized for not capturing semantic quality.
A prompting technique that asks the model to reason step by step before giving a final answer. Significantly improves performance on multi-step reasoning tasks.
AI that interprets and analyzes visual information from images and video. Encompasses object detection, image segmentation, OCR, and video understanding.
A training approach developed by Anthropic that uses a set of principles to guide model behavior. The model critiques and revises its own outputs against the constitution during training.
The maximum number of tokens a model can process in a single inference. Determines how much text, conversation history, or retrieved content a model can consider at once. Larger windows increase indirect injection risk in RAG systems.
The unintended exposure of sensitive or proprietary information through model outputs, logs, or debugging traces. OWASP LLM02. Can occur even when underlying infrastructure is secure.
Manipulation of training or fine-tuning data to alter model behavior in subtle or targeted ways. Also applies to RAG knowledge bases. OWASP LLM04. Previously mislabeled "model poisoning."
Protecting personal information used in AI training and inference. Covers consent, data minimization, access control, and compliance with GDPR, HIPAA, and similar regulations.
A branch of machine learning using multi-layered neural networks to learn hierarchical representations from large datasets. The foundation of modern LLMs, vision models, and speech systems.
A generative model that learns to reverse a noise-adding process to create data. The architecture behind most modern image generators (Stable Diffusion, DALL-E, Midjourney).
Training a smaller model to mimic the behavior of a larger one, preserving much of the performance at a fraction of the compute cost. Widely used to create deployable, efficient models.
Running AI models locally on devices (phones, sensors, edge servers) rather than cloud servers. Reduces latency and data exposure, but limits model size and capability.
A dense vector representation capturing the semantic meaning of text, images, or other data. The basis of semantic search and vector databases. Embeddings from sensitive content can leak information if the vector store is compromised.
Capabilities that appear in large models but are not present in smaller versions. Examples: multi-step reasoning, code generation, in-context learning. Their unpredictability is a safety and security concern.
One complete pass through the entire training dataset. Training typically runs for multiple epochs; too many can lead to overfitting.
Methods and techniques that make AI decision-making interpretable to humans. Important for compliance, auditing, and identifying when a model is behaving unexpectedly.
The harmonic mean of precision and recall. A balanced metric useful when false positives and false negatives both matter, such as in threat detection systems.
Learning from a very small number of examples (typically 1 to 10), provided directly in the prompt. Avoids full retraining while steering model behavior.
Adapting a pre-trained foundation model to a specific task using additional training on targeted data. Introduces risk of training data poisoning if fine-tuning data is untrusted.
A large model trained on broad, diverse data that can be adapted to many downstream tasks. GPT-4, Claude, Llama, and Gemini are all foundation models. Supply chain risks originate here.
AI systems that produce new content: text, images, code, audio, or video, based on patterns learned from training data. Includes LLMs, diffusion models, and multimodal systems. The primary focus of OWASP Top 10 for LLMs.
An optimization algorithm that iteratively adjusts model parameters in the direction that reduces the loss function. The core mechanism of neural network training.
A verified correct answer or label used for training or evaluation. The quality of ground truth data directly determines the quality of a trained model.
Hardware that accelerates the parallel computations required for AI training and inference. NVIDIA GPUs dominate the market; availability and cost are major constraints in AI deployment.
Policies, filters, checks, and controls that guide or restrict what an AI system can see, decide, or output. Can be input filters, output classifiers, tool-use restrictions, or behavioral policies.
A model generating plausible but factually incorrect or fabricated content. A major reliability concern in production AI systems, particularly those used for research, legal, or medical decisions.
Configuration settings fixed before training begins, such as learning rate, batch size, number of layers, and dropout rate. Tuning them correctly is often the difference between a useful model and a poor one.
Partitioning an image into distinct regions or objects. Used in medical imaging, autonomous vehicles, and surveillance systems.
A model adapting its behavior based on examples provided in the prompt, without any weight updates. Also called few-shot prompting when multiple examples are given.
Using a trained model to generate predictions on new inputs. Inference endpoints are a key attack surface: unsecured endpoints can leak model outputs or be abused for excessive consumption (OWASP LLM10).
A technique to bypass a model's safety constraints and extract restricted outputs. A specialized form of prompt injection targeting safety training rather than system instructions. Increasingly automated at scale.
A neural network trained on massive text data to understand and generate language. Examples: GPT-4, Claude, Gemini, Llama. The core engine behind most AI applications and the primary focus of OWASP Top 10 for LLMs.
The time delay between sending a request and receiving a model response. Critical in real-time applications; FHE-encrypted inference introduces additional latency that must be managed.
A compressed internal representation space where similar concepts cluster together. Understanding latent space is key to interpreting why a model produces certain outputs.
A function that measures the gap between a model's predictions and the correct outputs. Guides optimization during training. The choice of loss function shapes what behavior a model learns to optimize for.
An AI subset where systems learn patterns from data rather than following explicitly programmed rules. The broader field containing deep learning, reinforcement learning, and classical statistical approaches.
An open standard for connecting AI models to external tools and data sources. Widely adopted for agentic systems. MCP servers expand what agents can access, which increases the need for strict tool-use permissioning.
An architecture where different sub-networks (experts) specialize in different types of inputs, with a routing mechanism selecting which experts to activate. Used in GPT-4, Mixtral, and others to scale efficiently.
Documentation describing a model's intended use, capabilities, limitations, evaluation results, and known risks. An important governance artifact for AI transparency and compliance.
Models that process and reason across multiple data types (text, image, audio, video). Expands the attack surface to include image-based prompt injection and cross-modal data leakage.
The field of AI focused on understanding, interpreting, and generating human language. Encompasses text classification, sentiment analysis, machine translation, summarization, and modern LLMs.
A computational model using interconnected nodes (neurons) organized in layers to process information. The fundamental building block of deep learning and all modern AI systems.
Identifying and locating objects within images. Used in security cameras, autonomous vehicles, medical diagnostics, and manufacturing quality control.
Converting images of text into machine-readable text. Used to pre-process documents for AI systems. Malicious text in images can be an indirect prompt injection vector in multimodal systems.
A model memorizing training data patterns so closely that it fails to generalize to new inputs. Results in high training accuracy but poor real-world performance.
Learnable weights in a model that adjust during training. Model size is commonly described in parameter count (e.g., 7B, 70B). More parameters generally means greater capability and greater resource requirements.
A metric measuring how well a language model predicts a text sample. Lower perplexity means better prediction. Used in language model evaluation, though not always correlated with downstream task quality.
Of all the positive predictions a model makes, the proportion that are actually correct. High precision means few false positives. Important in threat detection where false alerts have real costs.
The input or instructions given to an AI model to guide its output. Includes system prompts (set by the developer), user queries, and retrieved context. The primary attack surface for LLM-based systems.
An attack where hidden or malicious instructions override the model's intended behavior. Ranked #1 in OWASP Top 10 for LLMs. Can be direct (from the user) or indirect (hidden in retrieved documents, web pages, or tool outputs).
Reducing model numerical precision (e.g., from 32-bit to 8-bit or 4-bit) to decrease model size and speed up inference. Enables running larger models on limited hardware with modest accuracy trade-offs.
An architecture combining retrieval from external knowledge with generation for grounded outputs. The model queries a vector database for relevant documents and uses them as context. Widely used and widely vulnerable to indirect prompt injection via poisoned knowledge bases.
Of all the actual positive cases, the proportion correctly identified by the model. High recall means few false negatives. In security contexts, low recall means threats are missed.
Adversarial testing to identify vulnerabilities and failure modes in AI systems. Includes prompt injection, jailbreak attempts, data extraction tests, and agentic misuse scenarios. Increasingly automated with tools like DiscoveR.
Learning through trial and error, with a model optimizing actions based on reward signals. Used in game-playing AI, robotics, and as the foundation for RLHF in LLM alignment.
A training method that uses human preference ratings to fine-tune model behavior. The primary technique used to make LLMs helpful, harmless, and honest. Susceptible to reward hacking if the human feedback process is gamed.
Predictable mathematical relationships between model size, training data volume, compute budget, and model performance. Provide a basis for forecasting model capabilities before training completes.
A method where the model generates its own training labels from input data (e.g., predicting the next word, or the masked word). The technique behind pre-training most modern LLMs.
Search based on meaning and intent rather than keyword matching. Powered by embeddings. The foundation of RAG pipelines and vector database retrieval.
Training on labeled data where each input is paired with a correct output. The most common machine learning paradigm for classification and regression tasks.
Artificially generated data used for training, augmentation, or evaluation. Useful when real data is scarce or sensitive. Synthetic data quality directly affects model behavior if used in fine-tuning.
A parameter controlling output randomness. Higher values produce more varied, creative outputs; lower values produce more focused, deterministic ones. Relevant to security: high temperature can increase the chance of unexpected or policy-violating outputs.
The number of requests a system can handle per unit of time. An important capacity planning metric for production AI deployments. Unbounded consumption attacks (OWASP LLM10) target throughput limits.
The basic unit of text that a model processes. A token is roughly a word or subword. Token counts determine cost, context window usage, and inference speed. Tokens are also the granularity at which injection attacks operate.
Splitting input text into tokens for model processing. Different tokenizers handle the same text differently, which can be exploited to bypass input filters that operate on raw text rather than tokens.
An AI's ability to invoke external functions, APIs, databases, or services to complete tasks. A major capability extension for agents and a key attack surface: tools with broad permissions are a high-value target for prompt injection.
Google's custom AI accelerator chips, designed specifically for matrix operations in neural network training and inference. Offer performance advantages over GPUs for certain workloads.
The dataset used to teach a model patterns and relationships. The quality, composition, and provenance of training data directly determines model behavior. A primary attack surface through data poisoning.
Applying knowledge learned in one task or domain to improve performance in another. Pre-training on broad data then fine-tuning on specific data is the dominant paradigm for modern AI development.
An architecture using self-attention mechanisms to process input sequences in parallel. Introduced in the 2017 paper "Attention Is All You Need." The foundation of all modern LLMs, vision transformers, and multimodal models.
A model too simple to capture the underlying patterns in training data. Results in poor performance on both training and new data. The opposite problem to overfitting.
Training on unlabeled data to discover hidden patterns, structures, or groupings. Used in clustering, anomaly detection, and dimensionality reduction.
Performing tasks without any task-specific training examples, relying entirely on the model's general knowledge. A key feature of modern foundation models and the basis for most out-of-the-box LLM deployments.