When we use artificial intelligence tools to generate images from a prompt, such as Stable Diffusion, Dall-e or Midjourney, we are using models previously trained with thousands of images that have been able to identify patterns to create images according to the input prompt.
If we find ourselves in the situation of wanting to generate an image with special characteristics for which no model has been trained, we have the possibility of creating our own model from images that will be used for training, allowing us to create objects, styles or people with total freedom.
At first glance it may seem complicated and accessible only to developers, but thanks to the Stable Diffusion Open Source community we have within our reach the possibility of training models in a simple way, although it requires the learning of certain parameters to optimise the results.
Over time, different methods have appeared to train models with Stable Diffusion, improving execution times and the weight of the resulting files. In this article we will discuss 3 of the most widely used: Dreambooth, Lora and Embbedings.
Stable Diffusion training methods
Dreambooth
Dreambooth was the first option that appeared to train models with Stable Diffusion, developed by Google in 2022. To get good results with Dreambooth we will need a set of images, between 20 or 30, to start training with them. The training process will consist of 3 steps:
- Choose a pre-trained model (you can choose one of the Stable Diffusion versions or one of the models available at Civitai).
- Prepare images of the person, object or style you want to train (as mentioned above, about 20-30).
- And finally the AI will start to learn and recognise what is in them (This last process can take 20-30 minutes).
Once the training is finished we will have a file of approximately 2-4Gb in size. We will only have to use the reference keyword associated with the images we have trained to generate a completely new one.
The main disadvantage of this training technique is the weight of the files, since with a few models we occupy several GB’s of storage.
Here are a couple of videos explaining the process in more detail. There are small differences between them so I recommend watching them to get a better understanding of the whole process and the different parameters that we can find depending on the Google Colab file we use.
Lora
LORA (Low-Rank Adaptation) models have the ability to generate the same results as models generated with Dreambooth but, as the name suggests, at a size up to 10 times smaller, weighing between 50-200MB, making it easier to use.
To make use of the models trained with LORA, just add the generated file to the stable-diffusion-webui/models/Lora folder if you are using the AUTOMATIC 1111 interface.
We must then make a call to the model directly at the prompt with the following nomenclature <lora:filename:multiplicator>
. Where filename is the name of the LORA model (without the extension .pt, .bin, etc). And multiplier is the weight applied to the LORA model.
Embeddings
Embeddings, also known as Textual inversion, is a method that has attracted attention because it manages to define a new keyword from a model without having to modify it. This means that its use is linked to the base model. This is its main feature, since on the less positive side the results we obtain will be of a lower quality if we train a model with the two techniques described above.
Where to find Stable Diffusion models
On the web there are several options to download Stable Diffusion models totally free of charge. One of the best known is Civitai, which has a search filter to facilitate the download of models generated with Dreambooth(checkpoint), LORA or Embeddings (Textual inversion).
This has only just begun, with time new training methods will surely come out, improving results, times and file weights. Any news regarding model training with Stable Diffusion will be published in this blog post. 😉