Learnable parameters ("trainable params") in a Keras Convolutional Neural Network
text
Trainable parameters in a Keras Convolutional Neural Network
In this episode, we'll discuss how we can quickly access and calculate the number of learnable parameters in a convolutional neural network in code with Keras. We'll also explore how these parameters may be affected by other optional configurations, so let's get to it!
Keras model with zero-padding
We have this pretty basic Keras convolutional neural network.
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense, Flatten
from keras.layers.convolutional import *
from keras.layers.pooling import *
model = Sequential([
Conv2D(
2
, kernel_size=(3,3)
, input_shape=(20,20,3)
, activation='relu'
, padding='same'
),
Conv2D(
3
, kernel_size=(3,3)
, activation='relu'
, padding='same'
),
Flatten(),
Dense(
2,
activation='softmax'
)
])
This model has an input layer consisting of images of size 20x20
with 3
color channels, a convolutional layer with 2
filters of size 3x3
, a second convolutional
layer with 3
filters of size 3x3
, a Flatten
layer to flatten the convolutional output, and then finally a Dense
output layer with just
2
nodes.
We can see in the convolutional layers that we're specifying βsame'
as our padding
, which we know from
another episode is zero-padding.
We also saw in a
previous Keras episode how we can view the number of learnable parameters in each layer of a Keras model, as well as the number of parameters within the full network by calling the summary()
function on our model and inspecting the Param #
column.
model.summary()
Layer (type) | Output Shape | Param # |
---|---|---|
conv2d_20 (Conv2D) |
(None, 20, 20, 2) |
56 |
conv2d_21 (Conv2D) |
(None, 20, 20, 3) |
57 |
flatten_9 (Flatten) |
(None, 1200) |
0 |
dense_14 (Dense)) |
(None, 2) |
2402 |
Total params: 2515
Trainable params: 2515
Non-trainable params: 0
We have this summary output for our model, and actually this model is an exact implementation of the conceptual model we worked with when we learned how to calculate the number learnable parameters in a CNN over in the deep learning fundamentals series.
If you recall from that episode, in our first convolutional layer, we indeed calculated that there were 56
learnable parameters, just as Keras is showing us in this output. We also calculated
that the second convolutional layer contained 57
learnable parameters and that the output layer consisted of 2402
parameters, giving us a total 2515
learnable parameters
in the entire network.
Now, remember, we're using zero padding here to maintain the dimensions of the images as they flow throughout the network. We previously saw where the dimensions come into play whenever we were calculating the number of learnable parameters in
the output Dense
layer.
We needed to calculate how many inputs we had coming into this layer, which we calculated as 1200
, as shown in the Output Shape
column of the Flatten
layer. The number
1200
was reached by multiplying 20x20x3
, where 3
was the number of filters in the last convolutional layer.
The 20x20
is from the dimensions of the image data as it is output from the previous convolutional layer. We can see these dimensions as the output shape for the second convolutional layer.
We then multiplied 1200
by the 2
nodes in the output layer and added the 2
bias terms, which gave us this result of 2402
.
If you're not getting a full grasp of the calculations I just summarized, then refresh your memory with the episode I referenced earlier on calculating the number of learnable parameters in CNNs.
Keras model without zero-padding
Now, if we were to not use zero padding, then what impact would that have on the number of learnable parameters in our model? Let's check it out.
model = Sequential([
Conv2D(2, kernel_size=(3,3), input_shape=(20,20,3), activation='relu'),
Conv2D(3, kernel_size=(3,3), activation='relu'),
Flatten(),
Dense(2, activation='softmax')
])
This is the same exact model that we were just working with, except that now we're not using zero padding, so we're no longer specifying the padding
parameter in the two convolutional
layers.
model.summary()
Layer (type) | Output Shape | Param # |
---|---|---|
conv2d_20 (Conv2D) |
(None, 18, 18, 2) |
56 |
conv2d_21 (Conv2D) |
(None, 16, 16, 3) |
57 |
flatten_9 (Flatten) |
(None, 768) |
0 |
dense_14 (Dense)) |
(None, 2) |
1538 |
Total params: 1651
Trainable params: 1651
Non-trainable params: 0
The number of learnable parameters in the two convolutional layers stays the same, but we can see that the number of parameters in the last Dense
layer has dropped considerably from
2402
to 1538
.
That's because, the dimensions of the images have shrunk in size to 16x16
by the time they're leaving the last convolutional layer, so now, rather than multiplying
20x20x3
, resulting in 1200
, we're multiplying 16x16x3
, which gives us 768
.
So, just by removing zero padding from the convolutional layers, the number of total learnable parameters in the network has dropped from 2515
to 1651
, a decrease of 34%.
Keras model with zero-padding and max-pooling
Now, let's put zero padding back into our model, and let's see what the impact to the number of learnable parameters would be if we added a max pooling layer to our model. After all, it's pretty conventional to use max pooling in a CNN.
model = Sequential([
Conv2D(2, kernel_size=(3,3), input_shape=(20,20,3), activation='relu', padding='same'),
Conv2D(3, kernel_size=(3,3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2,2),strides=2),
Flatten(),
Dense(2, activation='softmax')
])
So this is our original model with the same architecture, using zero padding, but now, we've added a max pooling layer after our second convolutional layer. The pool size we've specified is 2x2
with a stride of 2
.
We know from what we've learned about
max pooling earlier that this is going to reduce the dimensions of our images. In fact, this particular choice of the pool_size
and stride
cuts the dimensions in half. We can
see that as displayed in the Output Shape
column of this max pooling layer.
model.summary()
Layer (type) | Output Shape | Param # |
---|---|---|
conv2d_20 (Conv2D) |
(None, 20, 20, 2) |
56 |
conv2d_21 (Conv2D) |
(None, 20, 20, 3) |
57 |
max_pooling2d_1 (MaxPooling2D) |
(None, 10, 10, 3) |
0 |
flatten_9 (Flatten) |
(None, 300) |
0 |
dense_14 (Dense)) |
(None, 2) |
602 |
Total params: 715
Trainable params: 715
Non-trainable params: 0
So now, rather than multiplying the original 20x20x3
dimensions when we flatten the convolutional output, we now multiply 10x10x3
, as a result of max pooling.
This shrinks the learnable parameters drastically in our output layer from the original 2402
to 602
, which contributes to a reduced number of total learnable parameters in the network
from 2515
to 715
.
This is how we can access and confirm the total number of learnable parameters in a CNN in Keras, as well as see what type of impact these common techniques of zero padding and max pooling have on the number of learnable parameters in our model. See ya in the next one!
quiz
resources
updates
Committed by on