Build image classifier using transfer learning - Fine-tuning MobileNet with Keras
text
Building a fine-tuned MobileNet model with TensorFlow's Keras API
In this episode, we'll discuss how to build a fine-tuned MobileNet model and implement this model in code using TensorFlow's Keras API.
Now that we've seen what MobileNet is all about in the last episode, let's now talk about how we can fine-tune the model and and use transfer learning to train it on another dataset.
If you're not already familiar with the concept of fine-tuning, that's alright because we have several other episodes on fine-tuning using the VGG16 model with Keras, as well as an episode dedicated to the concept of fine-tuning and transfer learning, so check those out first if you need to.
Alright, let's jump into the code!
Fine-tuning MobileNet with Keras
First, make sure all the imports are in place from last time.
Similar to what we previously implemented with VGG16, we're going to be fine-tuning MobileNet on images of cats and dogs. The implementation will be pretty similar, but you'll notice there will be a few differences.
Many different breeds of cats and dogs were included in the ImageNet data set for which MobileNet was originally trained on, so the original model has already learned a lot about cats and dogs in general. Because of this, it won't take much tuning to get the model to perform well on this specific, more narrow classification task.
In a later episode, however, we'll be fine-tuning MobileNet on a completely new data set made up of classes that the model hasn't already learned about in it's original training, so stay tuned for that.
Preparing the data
Before we start tuning the model, we need to prepare the data. The data I'm using is a random subset of cat and dog image data from the
Kaggle cat versus dog competition, and I have my image data stored on disk in a specific directory structure in order to use the Keras flow_from_directory()
function that we'll see
in just a sec.
If you're following along, then you'll need to structure your data in the same way, and you can do that by following the episode on Image Preparation for CNNs with Keras.
We now define the path variables for where the training, validation, and test set reside on disk.
train_path = 'data/dogs-vs-cats/train'
valid_path = 'data/dogs-vs-cats/valid'
test_path = 'data/dogs-vs-cats/test'
Then, we create directory iterators for each dataset using Keras' ImageDataGenerator.flow_from_directory()
function, which yeilds batches of image data from the directory that we pass
in with our first parameter.
train_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input).flow_from_directory(
directory=train_path, target_size=(224,224), batch_size=10)
valid_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input).flow_from_directory(
directory=valid_path, target_size=(224,224), batch_size=10)
test_batches = ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet.preprocess_input).flow_from_directory(
directory=test_path, target_size=(224,224), batch_size=10, shuffle=False)
Notice the preprocessing_function
parameter we're supplying to ImageDataGenerator
. We're setting this equal to keras.applications.mobilenet.preprocess_input()
.
This is going to do the necessary MobileNet preprocessing on the images obtained from flow_from_directory()
.
Recall, we talked about this exact function in the last episode and its role in regards to preprocessing images for MobileNet.
To flow_from directory()
, we're passing in the path to the data set, the target_size
for the images, and the batch_size
we're choosing to use for training.
We do this exact same thing for all three data sets: train, validation, and test.
For the test_batches
variable, we're also supplying one additional parameter, shuffle=False
, so that we can later access the corresponding non-shuffled test lables to plot
a confusion matrix.
The data portion is now done. Next, let's move on to modifying the model.
Model modification
We import MobileNet in the same way we saw in the last episode.
mobile = tf.keras.applications.mobilenet.MobileNet()
Next, we're going to grab the output from the sixth to last layer of the model and store it in this variable x
.
x = mobile.layers[-6].output
We'll be using this to build a new model. This new model will consist of the original MobileNet up to the sixth to last layer. We're not including the last five layers of the original MobileNet.
By looking at the summary of the original model, we can see that by not including the last five layers, we'll be including everything up to and including the last global_average_pooling
layer. Run model.summary()
yourself or watch the corresponding video to see this.
Note that the amount of layers that you choose to cut off when you're fine-tuning a model will vary for each scenario, but I've found through experimentation that just removing the last 5
layers here works out well for this particular task. So with this setup, we'll be keeping the vast majority of the original MobileNet architecutre, which has 88
layers total.
Now, we append an output layer that we're calling output
, which will just be a Dense
layer with 2
output nodes, for cat and dog, and we'll use the
softmax
activation function.
output = Dense(units=2, activation='softmax')(x)
Now, we construct the new fine-tuned model, which we're calling model
.
model = Model(inputs=mobile.input, outputs=output)
Note, you can see by the Model
constructor used to create our model, that this is a model that is being created with the Keras Functional
API, not the Sequential
API
that we've worked with in previous episodes. That's why this format that we're using to create the model may look a little different than what you're used to.
To build the new model, we create an instance of the Model
class and specify the inputs
to the model to be equal to the input of the original MobileNet, and then we define the
outputs
of the model to be equal to the output
variable we created directly above.
This creates a new model, which is identical to the original MobileNet up to the original model's sixth to last layer. We don't have the last five original MobileNet layers included, but instead we have a new layer, the output layer we created with two output nodes.
You can compare the summary of the new model here with the summary of the original MobileNet to verifiy these differences using by calling summary()
on both the old and new models. This is also
shown in the corresponding video.
Now, we need to choose how many layers we actually want to be trained when we train on cats and dogs.
Here, we are freezing the weights of all the layers except for the last five layers in our new model, meaning that only the last five layers of the model will be trained.
for layer in model.layers[:-5]:
layer.trainable = False
By training only the last five layers, all the weights in the remaining earlier layers will not be updated during training and instead will be saved with the ImageNet weights from the original MobileNet.
Note that the number of layers that you choose to retrain is, again, one of those things that varies by situtation. Since the original MobileNet model has already generally learned about cats and dogs, we're not really needing to retrain many layers.
Now, our new model is now built, tuned, and ready to be trained on cats and dogs. Make sure you've got your model ready for training, and in the next episode we'll do that together, and we'll also see how the model holds up to predicting on new unseen images from our test set. See ya there!
quiz
resources
updates
Committed by on