Stable Diffusion Glossary
Every important Stable Diffusion term made easy.
#
4x-Ultrasharp
A popular upscaling model that upscales your image 4x times, but can also be used for other sizes.
A
AI Server
A dedicated system optimized for running AI models like Stable Diffusion, Flux or large language models. It typically features high-performance GPUs, ample RAM, and fast storage to handle intensive image generation tasks efficiently. AI servers can be local machines or cloud-based.
AI Upscaler
Software that uses AI models to increase image resolution. Unlike traditional methods, AI upscalers can add detail.
Ancestral Sampler
Ancestral sampling builds on each step, where every new part depends on the previous one, making the process more detailed but slower. Non-ancestral sampling works more independently, with each step not relying as much on what came before, which can be faster.
AnimateDiff
A tool that creates AI animations. It generates a sequence of coherent frames, allowing for motion in generated scenes. AnimateDiff can create anything from small movements to detailed character animations.
Anything v3
A custom Stable Diffusion 1.5 model. It's popular due to its versatile nature, but specifically for generating anime style images.
Automatic1111
One of the first and most popular user interfaces for Stable Diffusion. It's a feature-rich web UI and offers extensive customization options, support for various models and extensions, and a user-friendly interface. Automatic1111 is popular among both beginners and advanced users.
B
Birefnet
Birefnet is a neural network that helps generate images faster. It works by splitting the work into two paths (bifurcated). It also uses shortcuts (residual connections) that help the network learn more efficiently. See it like two lanes of a highway instead of one.
Black Forest Labs
The creators of Flux. BFL is a company working with AI and machine learning research for image generation. A major part of the team was initially a part of Stability AI.
C
Checkpoint Model
A term to describe an AI model, such as Stable Diffusion or Flux. Historically, models were .ckpt filetypes and called checkpoints. Checkpoint refers to the save point (x steps) when a model was trained, as training progresses in steps.
Civitai
A popular platform for sharing visual AI models and resources. It hosts a wide range of custom models, LoRAs and other resources.
Classifier-Free Guidance (CFG) Scale
A parameter that controls the balance between prompt adherence and creative freedom. Higher CFG values make the output more closely match the prompt, while lower values allow for more variation.
CLIP
Contrastive Language-Image Pretraining (CLIP) is a type of AI that learns by looking at images and their matching text description. This helps models like Stable Diffusion understand text and create images based on it. Some call CLIP a "text encoder," but that's only part of what it does — it also connects text with images.
ComfyUI
A node-based interface for AI models, offering granular control over the
generation process. It allows users to create complex workflows by connecting
various processing nodes. ComfyUI is favored by advanced users for its
flexibility and power. Checkout this step-by-step tutorial on
how to transform characters
with ComfyUI.
ControlNet
An extension that allows visual AI models' precise control over pose, shape, or composition using reference images. ControlNet enables highly directed image generation and editing through ControlNet models.
D
DDIM
Denoising Diffusion Implicit Models (DDIM), a sampling technique for faster image generation. Think of it like taking shortcuts when creating an image, so the AI can make high-quality pictures in fewer steps. It doesn’t follow the usual, slower path and is popular because it works quickly and still makes great images.
Deforum
A tool for creating animations with Stable Diffusion. It allows for keyframing of prompts, settings, and camera movements to produce complex sequences. Deforum is popular for creating dream-like animations and visual effects.
Denoising Strength
A parameter controlling how much an input image is altered in img2img operations. Lower values preserve more of the original, while higher values allow for more drastic changes. It's crucial for balancing fidelity and creativity in image editing.
Descriptive Prompting
Descriptive prompting is when you give an AI detailed and clear instructions about what you want. Instead of vague or general prompts, you provide specific descriptions of objects, settings, colors, styles, and other details to guide the AI in creating the most accurate result.
Diffusion
The core process in Stable Diffusion where random noise is gradually transformed into an image. It involves iteratively denoising data, guided by the learned model and input prompt. This process allows for the controlled generation of complex, high-quality images.
DPM Sampler
A DPM sampler (Denoising Diffusion Probabilistic Model sampler) is a method used by AI to generate images faster and more efficiently.
Dreambooth
A tool for training Stable Diffusion models with a dataset of input images. It allows users to teach the model new people, concepts or styles in high quality. A more intense process than Lora training.
DreamStudio
Stability AI's web interface for using Stable Diffusion models.
E
ELO Score
A rating system adapted for comparing AI model performance. In visual AI models, it's often used to rank different models or generations based on user preferences. ELO scores help in objectively assessing the relative quality of different models or outputs.
Embedding
A learned representation of concepts or styles in Stable Diffusion. Embeddings can be trained on specific images or text to introduce new elements into the model's vocabulary. They allow for quick fine-tuning and style control without modifying the entire model. Generally the quickest and lowest quality training.
Euler Sampler
A basic sampling method used in diffusion models. It's known for its speed but may produce lower quality results compared to more advanced methods. Euler sampling is often used for quick previews or when generation speed is prioritized over maximum quality.
F
Face Restoration
Post-processing techniques to improve the quality of faces in generated images. These methods can enhance details, correct proportions and increase realism.
Fine-Tunes
AI models that have been further trained on specific datasets. This allows the model to specialize in certain styles or subjects. Fine-tuned models can produce more consistent and tailored results for specific use cases.
Flux
Flux
is Black Forest Labs' visual AI model. Flux was released in August, 2024 and
during that time was one of, if not the highest quality model.
Fooocus
A simplified interface for Stable Diffusion, designed for ease of use. It streamlines the image generation process by automating many settings and offering a clean, intuitive UI. Fooocus is ideal for users who want quick results without deep technical involvement.
G
GGUF
GGUF is a file format used to store AI models in a way that makes them easier and faster for computers to work with. Think of it like a compact version of the AI's brain, optimized so that it runs faster and more efficiently on different devices.
H
Hires.fix
A technique to generate high-resolution images in two stages. It first creates a low-res version, then upscales and refines it with additional diffusion steps. This method can produce detailed, high-quality images while managing memory usage effectively. It was mainly popular for Stable Diffusion 1.5 models that were trained on lower resolution.
HuggingFace
A platform for sharing and collaborating on machine learning models. It hosts numerous AI models, datasets, and tools. HuggingFace has become a central hub for AI researchers and practitioners to access and contribute to the latest developments.
Hypernetwork
A small neural network used to modify the behavior of a larger network. They offer a lightweight method for customizing model outputs.
I
Img2Img, Image2Image, Image to Image
A technique that uses an existing image as a starting point for generation. It allows for guided modifications, style transfers, or variations of an original image. Img2img is fairly simple in comparison to more advanced tools like ControlNet.
Inference Times
The time taken for a model to generate an image from a given prompt. It's a crucial performance metric, affected by model size, hardware, and chosen parameters. Optimizing inference times is important for creating responsive AI applications.
Inpainting
The process of repainting specific parts of an image while keeping the rest unchanged. It's used for selective editing, object removal, or adding new elements to existing images. Inpainting is a key tool for precise image manipulation with AI.
InstantID
A method for quickly adding someone's face into generated images. InstantID is great for making personalized avatars or changing characters to look like real people.
IP-adapter
A method for integrating image prompts into the generation process. It allows users to guide the output using reference images alongside text prompts.
K
KDiffusion
KDiffusion is a technique used in AI image generation models, specifically in diffusion models, to improve how images are created step by step. In a diffusion model, the process starts with a noisy or blurry image, and the AI works to "denoise" it, gradually refining the image until it becomes clear.
Kohya
A tool for training and fine-tuning AI models like Stable Diffusion and Flux. It offers advanced options for customizing the training process.
Ksampler
KSampler is one of the tools used within the KDiffusion framework. A sampler’s job is to decide how to take those gradual steps when removing the noise and generating the image. The KSampler helps determine the path the model should follow during each step of the image generation process, allowing for better and faster results.
L
LAION-5B
A large dataset of image-text pairs used in training many AI models, including some versions of Stable Diffusion. It contains billions of diverse images with associated captions.
Latent Diffusion
The specific type of diffusion model used in Stable Diffusion. It operates in a compressed latent space rather than pixel space, allowing for efficient processing of high-resolution images. Latent diffusion is key to Stable Diffusion's ability to generate detailed images quickly.
Latent Diffusion Model (LDM)
LDM (Latent Diffusion Model) is a type of AI model used to generate images efficiently. Instead of working with the full, detailed image directly, it works in a smaller, simpler version of the image called the latent space. This makes the process faster and less resource-intensive. Stable Diffusion is a Latent Diffusion Model.
Latent Space
A compressed representation of images used by Stable Diffusion for efficient processing. It's a lower-dimensional space where the model performs its operations. Working in latent space allows for faster computation and more effective handling of high-resolution outputs.
LMS
Linear Multi-Step method, a sampling technique that uses information from multiple previous steps to inform the current step. This can lead to improved image quality and stability in the generation process. LMS is one of several advanced sampling options available in many Stable Diffusion implementations.
LoRA
Low-Rank Adaptation, a technique for fine-tuning AI models. A LoRA is generally used to train a model on a new character or style. LoRAs are popular due to the ease and speed of training as well as being a small separate file independent of the main model.
LyCORIS
Linearized Conditional Regularization Interpolation for Style - An extension of LoRA that allows for more flexible and powerful model customization. It offers additional training options and can capture more complex styles or concepts.
N
Negative Prompt
Text instructions specifying what should not appear in the generated image.
Node
A node is like a building block or tool used in a user interface to create AI workflows. Each node performs a specific function or task. These nodes are connected together to form a workflow, which is a visual way of organizing how an AI model operates.
Noise Schedule
A noise schedule in AI (especially in diffusion models) is like a plan for how much noise or randomness to add to an image at each step during the image generation process.
O
Outpainting
The process of extending an image beyond its original boundaries. Try out this step-by-step outpainting workflow with ComfyUI.
P
Prompt
The text input that tells the AI what you want. An example prompt for visual generative AI could be "A cat in a hat is sitting on a table and reading a book titled 'Paw-sitively funny dad jokes'"
Prompt Engineering
The practice of crafting effective prompts to achieve desired outcomes in AI image generation.
Prompt Schedule
A technique where the prompt changes during the generation process. This is often used in animations, as prompts can change per frame.
Q
Quantized Models
Quantized models are like smaller, faster versions of AI models. They use fewer numbers (or less detailed numbers) to do their calculations, which makes them quicker and take up less space. However, they might be a little less accurate compared to the original, larger models.
R
Regional Prompter
A technique for applying different prompts to specific regions of an image.
Render Time
The total time taken to generate an image, including all processing steps.
S
Safetensors
A file format for storing model weights, designed to be more secure and efficient.
Sampling Method/Sampler
The algorithm used to generate images from the noise during the diffusion process. Different sampling methods can affect image quality and speed. Examples are Euler, DDIM, DPM.
Sampling Steps
The number of iterations in the denoising process when generating an image. More steps can lead to higher quality images, but increase generation time.
SD.Next
A fork of the Automatic1111 web UI with additional features and optimizations. It was earlier referred to as Vlad Diffusion.
SDXL
Stable Diffusion XL, a larger, more capable version of Stable Diffusion released in 2023. Compared to the earlier models trained on 512x512px, this was trained on 1024x1024px hence achieving better quality overall.
Seed
A seed is a number that tells the AI how to start its random guessing process. In image generation, if you use the same seed, model, and settings, the AI will produce the same image every time.
Stable Diffusion
An AI model for generating images based on text descriptions. Since its release in 2022, it has become a cornerstone of AI image generation, spawning numerous variants and a vibrant community.
Stable Diffusion versions
Stable Diffusion 1.x:
Internal versions of SD before the public release.
Stable Diffusion 1.4:
The first public release but was quickly replaced by 1.5.
Stable Diffusion 1.5:
Trained on 512x512px. The first big popular model and is still active to this day due to huge community support with custom models and fine-tunes.
Stable Diffusion 2.0:
Failure at launch, it was never publicly adopted.
Stable Diffusion 2.1:
Failure at launch, it was never publicly adopted.
Stable Diffusion XL:
Trained on 1024x1024px resolution. Great community adoption and still being used to this day.
Stable Diffusion 3.0:
Semi-failure at launch. While the model can produce good images, it has huge problems with human anatomy. The community never adopted the model for future fine-tuning.
Stable Zero123
Stable Zero123 is an AI model that focuses on generating 3D objects.
StableStudio
A platform for open source tools developed by Stability AI.
SwarmUI
A user interface built on top of ComfyUI, adding a better user experience while still maintaining everything ComfyUI has to offer.
T
T5
T5 is an AI model that turns every task into a text problem, like answering questions or summarizing stories, by transforming one piece of text into another.
Text2Image, Txt2Img, Text-to-Image
The process of generating images directly from text descriptions. Text2Image is the primary use case of Stable Diffusion, enabling users to create visuals based on text prompts.
Textual Inversion
A technique for teaching Stable Diffusion new concepts or styles with just a few example images. Textual Inversion allows users to introduce custom objects, characters or artistic styles into the model's vocabulary.
Tokenizer
A tokenizer is a tool that breaks down text into smaller pieces, called tokens, so that an AI model can understand and work with it. Tokens can be as small as single characters, parts of words or full words.
Trigger Keyword
A specific word or phrase in a prompt that activates a particular style or trained concept. These are often associated with custom LoRAs, allowing users to activate it.
U
U-Net
A U-Net is a type of neural network designed for tasks like image processing. Imagine a U-shaped pipe: data goes in one side, gets processed in the middle, and then comes out the other side. Along the way, the network "shrinks" the image, keeping important details, and then "expands" it back to its original size.
UniPC
Unified Predictor-Corrector, an advanced sampling method for diffusion models. It combines the strengths of various sampling techniques to achieve high-quality results with fewer steps, offering a balance between speed and image quality.
Upscaling
The process of increasing an image's size while maintaining or improving its quality.
User Interface (UI)
UI stands for User Interface. It’s everything you see and interact with when using a computer program or website. Think of buttons, menus, icons, and text boxes—these are all parts of the UI. In image generation, example UIs are: ComfyUI, Automatic1111, Forge, Swarm, Fooocus.
V
VAE
Variational AutoEncoder, a key component in Stable Diffusion that handles the compression (encoding) of images into a latent space and their decompression (decoding) back into pixels we can see. Different AI models usually have different VAEs.
W
Workflow
A set of steps and settings used to create specific outputs in generative AI.
This term is most used together with ComfyUI where you can save and share your
workflows. Checkout this ComfyUI workflow to recreate amazing animations like
dancing noodles
by James Gerde.