AI Art Taking World By Storm - Diffusion Models Overview
Introducing the End of Course Project
In this course, we are going to learn how to use Stable Diffusion to create the Women of the World AI art project.
Women of the World is an AI art project consisting of 2916 AI-generated photos of women in the likeness of Mandy from Deeplizard influenced by various cultures from around the world.
To create this AI art piece, Chris from Deeplizard used Stable Diffusion v1.4 and GFPGAN, along with Wikipedia's list of contemporary ethnic groups. The full piece can be viewed in this video.
Generative AI Overview: Diffusion Models
The future is here, and AI art is taking the world by storm. People everywhere are typing in sentences known as prompts and getting back compelling images. This is made possible by using what is known as generative AI.
The specific technology being used is known as diffusion models. At the time of this writing, these are the most popular diffusion models:
- Stable Diffusion
- DALL-E 2
The most exciting in this list is Stable Diffusion. This is because it is freely available for all to use. The model's weights have been release into the wild, and its use is not gated or restricted by any central authority.
There is no turning back now! Some are calling for and fear the end is near for human made art. However, this couldn't be further from the truth. The truth is that generative AI is an amazing tool for human artists to utilize to create projects that simply weren't possible before.
Text to Image
The Women of the World project uses the model called Stable Diffusion. With Stable Diffusion, we give the model some text that we call a prompt, and the model uses this text to generate an image that represents the prompt. This allows us to get what we ask for.
In the artificial intelligence field, we call this process text to image. The model can produce an image using text as input. The outputs from the Stable Diffusion model are novel, newly generated images that have never been seen before.
The image that is produced is based on complex patterns the model has previously extracted and composed based on the data the model was trained on. If the training dataset is very large, we can have a very large output space because the model knows about more things.
In Stable Diffusion's case, the model is very large and knows quite a bit. The model knows how different people from all of the regions of the world look. As a result, the model can generate accurately depicted people from various regions given a text prompt.
At this point, I would like to emphasize that this type of AI is a tool that we can use to produce something that was previously impossible, especially if we are talking about a solo artist. Stable Diffusion and models like it are an advancement that open up the creative possibility space. This in turn will allow for greater creativity in art and communication more generally.
I'm reminded of the Jevons paradox here. The cost of creating an image is dropping dramatically as a result of this advancement in AI, and as a result, the demand for images will increase. Since images are more accessible without roadblocks, more people will use images for creative purposes.
Image to Image
Let's talk about the next important thing we need to understand about Stable Diffusion. When Stable Diffusion generates an image, the process is iterative. The model starts with noise or random pixels and iteratively updates these random pixels moving the overall image closer to the text prompt at each step.
For this reason, it's possible for us to tell Stable Diffusion to use our input image as a starting point and go from there. This gives us more control and the ability to guide the process more. In this case, the model will start from our specified image instead of random noise. This process is called image to image.
Technically speaking, we are taking both a text prompt and an image as input and getting an image as output, but we'll still hear this referred to as image to image.
Committed by on