Learnable Parameters in a Convolutional Neural Network (CNN) explained
text
Learnable parameters in a CNN
What's going on everyone? Last time, we learned about learnable parameters in a fully connected network of dense layers. Now, we're going to talk about these parameters in the scenario when our network is a convolutional neural network, or CNN.
We'll first start out by discussing what the learnable parameters within a convolutional neural network are, and then see how the total number of learnable parameters within a CNN is calculated. And after we see how this is done, we'll illustrate the calculation using a simple convolutional neural network.
What are the learnable parameters in a CNN?
Alright, what are the learnable parameters in a CNN? Well, it turns out, that generally, they're the same parameters we saw in a standard fully connected network. That is, the weights and biases. But, we have to consider how, architecturally, the two types of networks are different, and how that's going to affect our calculation. Let's explore that now.
How the number of learnable parameters is calculated
So, just as with a standard network, with a CNN, we'll calculate the number of parameters per layer, and then we'll sum up the parameters in each layer to get the total amount of learnable parameters in the entire network.
// pseudocode
let sum = 0;
network.layers.forEach(function(layer) {
sum += layer.getLearnableParameters().length;
}
For a dense layer, this is what we determined would tell us the number of learnable parameters:
Now, let's consider what a convolutional layer has that a dense layer doesn't.
A convolutional layer has filters, also known as kernels. As the architects of our network, we determine how many filters are in a convolutional layer as well as how large these filters are, and we need to consider these things in our calculation.
With this in mind, we'll modify our formula for determining the number of learnable parameters in a convolutional layer.
So, what is the input going to be for a given convolutional layer? Well that's going to depend on what type of layer the previous layer was.
- If the previous layer was a dense layer, the input to the conv layer is just the number of nodes in the previous dense layer.
- If the previous layer was a convolutional layer, the input will be the number of filters from that previous convolutional layer.
Now, what's the output of a convolutional layer?
- With a dense layer, it was just the number of nodes.
- With a convolutional layer, the output will be the number of filters times the size of the filters.
We'll see this illustrated in just a sec. Finally, the number of biases, well that'll just be equal to the number of filters in the layer.
So overall, we have the same general setup for the number of learnable parameters in the layer being calculated as the number of inputs times the number of outputs plus the number of biases.
Just with a convolutional layer, the inputs and outputs themselves are considering the number of filters and the size of the filters. Let's check ourselves by seeing this calculation in action with a simple CNN.
Calculating the number of learnable parameters in a CNN
Suppose we have a CNN made up of an input layer, two hidden convolutional layers, and a dense output layer.
- input layer
- hidden convolutional layer
- hidden convolutional layer
- dense output layer
Our input layer is made up of input data from images of size 20x20x3
, where 20x20
specifies the width and height of the images, and 3
specifies the number of channels.
The three channels indicate that our images are in RGB color scale, and these three channels will represent the input features in this layer.
Our first convolutional layer is made up of 2
filters of size 3x3
. Our second convolutional layer is made up of 3
filters of size 3x3
. And our output layer
is a dense layer with 2
nodes.
We'll assume that the network contains bias terms and that we're using zero padding throughout the network to maintain the dimensions of the images. Check the zero padding video if you're unfamiliar with this concept.
- input layer - images of size
20x20x3
- hidden convolutional layer -
2
filters of size3x3
- hidden convolutional layer -
3
filters of size3x3
- dense output layer -
2
nodes
Input layer
Now, the same rule applies here for the input layer that we talked about last time. The input layer has no learnable parameters since it just contains the input data.
Conv layer 1
Moving on to the first hidden convolutional layer, how many inputs do we have coming into this layer? We have 3
from our input layer. How many outputs? Well, let's see. Remember, the number
of outputs is the number of filters times the filter size. So we have two filters, each of size 3x3
. So 2*3*3 = 18
. Multiplying our three inputs by our 18
outputs,
we have 54
weights. Now how many biases? Just two, since the number of biases is equal to the number of filters. So that gives us 56
total learnable parameters in this layer.
Conv layer 2
Now let's move to our next convolutional layer. How many inputs are coming in to this layer? We have two from the number of filters in the previous layer. How many outputs? Well, we have three filters,
again of size 3x3
. So that's 3*3*3 = 27
outputs. Multiplying our two inputs by the 27
outputs, we have 54
weights in this layer. Adding three bias
terms from the three filters, we have 57
learnable parameters in this layer .
Output layer
Onto the output layer. How many inputs? We may think just three, right, since that's the number of filters in the last convolutional layer? But, that's not quite right. If you've followed the Keras series, you know that before passing output from a convolutional layer to a dense layer, that we have to flatten the output by multiplying the dimensions of the data from the conv layer by the number of filters in that layer. In our case, the data is image data.
Since we're assuming that this network uses zero padding, the dimensions of our images of size 20x20
haven't changed by the time we get to this layer. So multiplying 20x20
by the three filters gives us a total of 1200
inputs coming in to our output layer.
Now, since this output layer is a dense layer, the number of outputs is just equal to the number of nodes in this layer, so we have two outputs. Multiplying 1200*2
gives us 2400
weights.
Adding in our two biases from this layer, we have 2402
learnable parameters in this layer.
The result
Summing up the parameters from all the layers gives us a total of 2515
learnable parameters within the entire network.
So we can see that the process for determining the number of learnable parameters in a convolutional network is generally the same as a standard fully connected network, but we have to do a little extra work by considering some extras, like the number of channels being used in image data, the number of filters, the filter sizes, and flattening convolutional output.
Next up in the Keras series
We'll be implementing this in code using Keras in the Keras series, so be sure to check that out as well, and in the mean time, let me know your thoughts. See ya soon.
quiz
resources
updates
Committed by on