Components of Stable Diffusion

video

expand_more

text

expand_more

Components of Stable Diffusion

Latent diffusion models like Stable Diffusion actually make use of several models and other tools during training and inference. The major components of latent diffusion models include:

U-Net
Text encoder
Autoencoder (VAE)
Noise Scheduler

Let's get introduced to each of these components now.

U-Net

At the core of Stable Diffusion is a deep neural network called U-Net. This is a type of convolutional neural network where the training and inference take place.

U-Net receives image input and also gives image output. The first half of layers in the network are made up of convolutional layers, which down-sample the input image. The second half of layers are made up of upsampling layers, which up-sample the image data. We can visualize this architecture as having a U shape, hence the name U-Net.

We'll expand much further on the details of the model, what exactly is happening during this downsampling and upsampling procedure, as well as the training and inference processes later in the course. For now, we can just know that U-Net is the meat of the model, and it's what's responsible for generating the images.

Text Encoder

As we know, we pass a text prompt to a diffusion model to specify the type of image we want it to generate. Like most text data passed to neural networks, this text must first get encoded before it gets passed as input to the U-Net. To do this encoding we use a pre-trained text and image encoder called CLIP, which encodes the text-image pairs into embeddings.

Word embeddings are vectors that have numerically encoding some semantic meaning about the underlying words. These embeddings are ultimately what is passed as the textual part of the input to U-Net. See the NLP Intro fot Text course for further understanding of word embeddings.

Unlike a typical text encoder, CLIP is actually trained not only to encode text, but to encode both text and image data. Within CLIP, text is encoded into an embedding via a text encoder, and images are encoded via an image encoder. The text and image encodings are then used to create embeddings for the text-image pairs.

We'll expand much more on CLIP in a future lesson as well.

Variational Autoencoder

As we mentioned, latent diffusion models work with compressed latent representations of images. Along with the encoded text prompts, U-Net also accepts these compressed noisy images as input.

This compression is achieved using the encoder portion of a trained autoencoder, or more specifically, a variational autoencoder (VAE). During inference, the output from the U-Net is then decoded from a latent to an image using the decoder portion of the VAE.

Noise Scheduler

As we now know, during training, U-Net accepts compressed noisy images as input. The way in which noise is added to image samples is determined by the noise scheduler, which samples noise from some set distribution and adds this noise to the images according to a set schedule.

We'll also see that during inference, U-Net works to denoise the input. This concept of adding and removing noise during training and inference may seem a bit abstract now, but we will completely solidify these concepts in upcoming lessons.

Now we have a general idea of what a latent diffusion model is, along with the major components that are involved with training and inference. We'll be covering them all in much more detail throughout the course.

quiz

expand_more

resources

expand_more

Latent diffusion models like Stable Diffusion actually make use of several models and other tools during training and inference. The major components of latent diffusion models include: 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💪 CHECK OUT OUR FITNESS CHANNEL: 🔗 https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: 🔗 https://neurohacker.com/shop?rfsn=6488344.d171c6 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd 🎓 Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

Stable Diffusion Masterclass - Theory, Code & Application

Components of Stable Diffusion

video

text

Components of Stable Diffusion

U-Net

Text Encoder

Variational Autoencoder

Noise Scheduler

quiz

Quiz Results

resources

updates

Update history for this page