Introduction to Latent Diffusion Models
text
Introduction to Latent Diffusion Models
Latent diffusion models are generative deep learning models that have skyrocketed in popularity, not only within the AI community, but out into the general public as well.
Now let's begin our journey by investigating what exactly a latent diffusion model is, starting with its precursor, the diffusion model.
Diffusion Models
At the highest level, a diffusion model is a generative deep learning model. As we know, models in deep learning make use of deep neural networks, and a generative model is one that is used to generate new data. So at the highest level, we can simply think of a diffusion model as a neural network that generates things. As we've come to be familiar with, the "things" that we all are most interested in using diffusion models for currently are images. We'll later cover the network architecture and its various components.
Now let's take it a step further to understand the name, diffusion.
In general, diffusion models are trained to denoise data in a sequential step-by-step fashion. As we'll later become familiar, this denoising process is how a diffusion model can generate images. For now, we can simply think of the training data as noisy images, and a diffusion model being trained to remove the noise from the images so that we're left with clear image samples.
The way in which a diffusion model removes noise is generally referred to as the diffusion process. Later, we'll develop an understanding for how exactly this noise removal diffusion process results in the cool images we've seen diffusion models generate. For now though, we can just think of diffusion models as accepting noisy input images and outputting clear images.
Latent Diffusion Models
Elaborating more on the nomenclature, popular models like Stable Diffusion, for example, are considered latent diffusion models. A latent diffusion model is a type of diffusion model.
Continuing with our image data example, general diffusion models apply the diffusion process over the pixel space of the image data. As we know, images can be large in size, and training on image data can require lots of memory and computational resources.
Latent diffusion models work with compressed representations of the original image data. These compressed images are referred to as latents. In other words, latent diffusion models work in a lower dimensional latent space, rather than a higher dimensional pixel space, which reduces the amount of computational resources required and improves efficiency.
This is one of the reasons models like Stable Diffusion are so impressive. Not only can they generate these amazing creations, but they also don't need as much compute power in order to do so. These models can run easily in free versions of Google Colab with a GPU runtime.
Now that we've been introduced to latent diffusion models in general, we're ready to start learning about the various components and training mechanisms that make these models work.
quiz
resources
updates
Committed by on