One-hot encodings for machine learning
In this post, we’re going to discuss one-hot encoding, and how we make use of it in machine learning.
In previous posts, we've talked about how labels for images in Keras were actually one-hot encoded vectors. Let’s discuss exactly what this means.
We know that when we’re training a neural network via supervised learning, we pass labeled input to our model, and the model gives us a predicted output.
If our model is an image classifier, for example, we may be passing labeled images of animals as input. When we do this, the model is usually not interpreting these labels as words, like dog or cat. Additionally, the output that our model gives us in regards to its predictions aren’t typically words like dog or cat either. Instead, most of the time our labels become encoded, so they can take on the form of an integer or of a vector of integers.
Hot and cold values
One type of encoding that is widely used for encoding categorical data with numerical values is called one-hot encoding.
One-hot encodings transform our categorical labels into vectors of
1s. The length of these vectors is the number of classes or categories that our model is expected to
Vectors of 0s and 1s
If we were classifying whether images were either of a dog or of a cat, then our one-hot encoded vectors that corresponded to these classes would each be of length
2 reflecting the two categories.
If we added another category, like lizard, so that we could then classify whether images were of dogs, cats, or lizards, then our corresponding one-hot encoded vectors would each be of length
3 since we now have three categories.
Alright, so we know the labels are transformed or
encoded into vectors. We know that each of these vectors has a length that is equal to the number of output categories, and we briefly mentioned that the vectors contain
Let’s go into further detail on this last piece.
One-hot encodings for multiple categories
Let’s stick with the example of classifying images as being either of a
lizard. With each of the corresponding vectors for these categories being of length
3, we can think of each index or each element within the vector corresponding to one of the three
Let’s say for this example that the cat label corresponds to the first element, dog corresponds to the second element, and lizard corresponds to the third element.
With each of these categories having their own place in the corresponding vectors, we can now discuss the intuition behind the name one-hot.
With each one-hot encoded vector, every element will be a zero EXCEPT for the element that corresponds to the actual category of the given input. This element will be a hot one.
Sticking with our same example, recall we said that a cat corresponded to the first element, dog to the second, and lizard to the third, so the corresponding one-hot encoded vectors for each of these categories would look like this.
For cat, we see that the first element is a one and the next two elements are zeros. This is because each element within the vector is a zero except for the element that corresponds to the actual category, and we said that the cat category corresponded to the first element.
One vector for each category
Similarly, for dog, we see that the second element is a one, while the first and third elements are zeros. Lastly, for lizard, the third element is a one, while the first and second elements are zeros.
We can see that each time the model receives input that is a cat, it’s not interpreting the label as the word
cat, but instead is interpreting the label as this vector
For images labeled as dog, the model is interpreting the dog label as the vector
[0,1,0], and for images labeled as lizard, the model is interpreting the label as the vector
Just for clarity purposes, say we add another category, llama, to the mix. Now, we have four categories total, and so this will cause each one-hot encoded vector corresponding to each of these categories to be of length
The vectors will now look like this.
We can see that for each of our pre-existing categories of cat, dog, and lizard, we still have the corresponding one for each of these vectors in the same places where they were before. The one is the first element for cat, second for dog, and third for lizard. The new, fourth element for each of our existing categories is just a zero since this fourth element corresponds to the llama category.
Finally, the new one-hot encoded vector for the llama category is all zeros except for the fourth element, which is a one, since the fourth element corresponds to the llama category.
Note that we just arbitrarily said that cat corresponded to the first element, dog to the second, lizard to the third, and llama to the fourth, but this could very well be in a different order. This just depends on how the underlying code or library is doing the one-hot encoding.
If you’re interested in understanding how to view the mapping between which element or index corresponds to which label in Keras for image data, check out the post in the Keras series showing how that can be done.
We should now understand what one-hot encoding is and how labels are transformed into one-hot encoded vectors for classification purposes when working with artificial neural networks. I’ll see ya in the next one!