Learnable parameters ("trainable params") in a Keras Convolutional Neural Network

video

expand_more

text

expand_more

Trainable parameters in a Keras Convolutional Neural Network

In this episode, we'll discuss how we can quickly access and calculate the number of learnable parameters in a convolutional neural network in code with Keras. We'll also explore how these parameters may be affected by other optional configurations, so let's get to it!

Keras model with zero-padding

We have this pretty basic Keras convolutional neural network.

from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense, Flatten
from keras.layers.convolutional import *
from keras.layers.pooling import *

model = Sequential([
    Conv2D(
        2
        , kernel_size=(3,3)
        , input_shape=(20,20,3)
        , activation='relu'
        , padding='same'
    ),
    Conv2D(
        3
        , kernel_size=(3,3)
        , activation='relu'
        , padding='same'
    ),
    Flatten(),
    Dense(
        2, 
        activation='softmax'
    )
])

This model has an input layer consisting of images of size 20x20 with 3 color channels, a convolutional layer with 2 filters of size 3x3, a second convolutional layer with 3 filters of size 3x3, a Flatten layer to flatten the convolutional output, and then finally a Dense output layer with just 2 nodes.

We can see in the convolutional layers that we're specifying ‘same' as our padding, which we know from another episode is zero-padding.

We also saw in a previous Keras episode how we can view the number of learnable parameters in each layer of a Keras model, as well as the number of parameters within the full network by calling the summary() function on our model and inspecting the Param # column.

model.summary()

Layer (type)	Output Shape	Param #
`conv2d_20 (Conv2D)`	`(None, 20, 20, 2)`	`56`
`conv2d_21 (Conv2D)`	`(None, 20, 20, 3)`	`57`
`flatten_9 (Flatten)`	`(None, 1200)`	`0`
`dense_14 (Dense))`	`(None, 2)`	`2402`

Total params: 2515
Trainable params: 2515
Non-trainable params: 0

We have this summary output for our model, and actually this model is an exact implementation of the conceptual model we worked with when we learned how to calculate the number learnable parameters in a CNN over in the deep learning fundamentals series.

If you recall from that episode, in our first convolutional layer, we indeed calculated that there were 56 learnable parameters, just as Keras is showing us in this output. We also calculated that the second convolutional layer contained 57 learnable parameters and that the output layer consisted of 2402 parameters, giving us a total 2515 learnable parameters in the entire network.

Now, remember, we're using zero padding here to maintain the dimensions of the images as they flow throughout the network. We previously saw where the dimensions come into play whenever we were calculating the number of learnable parameters in the output Dense layer.

We needed to calculate how many inputs we had coming into this layer, which we calculated as 1200, as shown in the Output Shape column of the Flatten layer. The number 1200 was reached by multiplying 20x20x3, where 3 was the number of filters in the last convolutional layer.

The 20x20 is from the dimensions of the image data as it is output from the previous convolutional layer. We can see these dimensions as the output shape for the second convolutional layer. We then multiplied 1200 by the 2 nodes in the output layer and added the 2 bias terms, which gave us this result of 2402.

If you're not getting a full grasp of the calculations I just summarized, then refresh your memory with the episode I referenced earlier on calculating the number of learnable parameters in CNNs.

Keras model without zero-padding

Now, if we were to not use zero padding, then what impact would that have on the number of learnable parameters in our model? Let's check it out.

model = Sequential([
    Conv2D(2, kernel_size=(3,3), input_shape=(20,20,3), activation='relu'),
    Conv2D(3, kernel_size=(3,3), activation='relu'),
    Flatten(),
    Dense(2, activation='softmax')
])

This is the same exact model that we were just working with, except that now we're not using zero padding, so we're no longer specifying the padding parameter in the two convolutional layers.

model.summary()

Layer (type)	Output Shape	Param #
`conv2d_20 (Conv2D)`	`(None, 18, 18, 2)`	`56`
`conv2d_21 (Conv2D)`	`(None, 16, 16, 3)`	`57`
`flatten_9 (Flatten)`	`(None, 768)`	`0`
`dense_14 (Dense))`	`(None, 2)`	`1538`

Total params: 1651
Trainable params: 1651
Non-trainable params: 0

The number of learnable parameters in the two convolutional layers stays the same, but we can see that the number of parameters in the last Dense layer has dropped considerably from 2402 to 1538.

That's because, the dimensions of the images have shrunk in size to 16x16 by the time they're leaving the last convolutional layer, so now, rather than multiplying 20x20x3, resulting in 1200, we're multiplying 16x16x3, which gives us 768.

So, just by removing zero padding from the convolutional layers, the number of total learnable parameters in the network has dropped from 2515 to 1651, a decrease of 34%.

Keras model with zero-padding and max-pooling

Now, let's put zero padding back into our model, and let's see what the impact to the number of learnable parameters would be if we added a max pooling layer to our model. After all, it's pretty conventional to use max pooling in a CNN.

model = Sequential([
    Conv2D(2, kernel_size=(3,3), input_shape=(20,20,3), activation='relu', padding='same'),
    Conv2D(3, kernel_size=(3,3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2,2),strides=2),
    Flatten(),
    Dense(2, activation='softmax')
])

So this is our original model with the same architecture, using zero padding, but now, we've added a max pooling layer after our second convolutional layer. The pool size we've specified is 2x2 with a stride of 2.

We know from what we've learned about max pooling earlier that this is going to reduce the dimensions of our images. In fact, this particular choice of the pool_size and stride cuts the dimensions in half. We can see that as displayed in the Output Shape column of this max pooling layer.

model.summary()

Layer (type)	Output Shape	Param #
`conv2d_20 (Conv2D)`	`(None, 20, 20, 2)`	`56`
`conv2d_21 (Conv2D)`	`(None, 20, 20, 3)`	`57`
`max_pooling2d_1 (MaxPooling2D)`	`(None, 10, 10, 3)`	`0`
`flatten_9 (Flatten)`	`(None, 300)`	`0`
`dense_14 (Dense))`	`(None, 2)`	`602`

Total params: 715
Trainable params: 715
Non-trainable params: 0

So now, rather than multiplying the original 20x20x3 dimensions when we flatten the convolutional output, we now multiply 10x10x3, as a result of max pooling.

This shrinks the learnable parameters drastically in our output layer from the original 2402 to 602, which contributes to a reduced number of total learnable parameters in the network from 2515 to 715.

This is how we can access and confirm the total number of learnable parameters in a CNN in Keras, as well as see what type of impact these common techniques of zero padding and max pooling have on the number of learnable parameters in our model. See ya in the next one!

quiz

expand_more

resources

expand_more

Let's discuss how we can quickly access and calculate the number of learnable parameters in a convolutional neural network (CNN) in code with Keras. We'll also explore how these parameters may be affected by other optional configurations, like max pooling and zero padding. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 05:40 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 https://deeplizard.com/resources ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy BufferUnderrun Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/learn/video/RznKVRTFkBY Learn PyTorch - https://deeplizard.com/learn/video/v5cngxo4mIg Reinforcement Learning - https://deeplizard.com/learn/video/nyjbcRQ-uQ8 Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd 🎓 Other Courses: Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ 🔗 http://incompetech.com/ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

Deep Learning Deployment Fundamentals