Learnable Parameters in an Artificial Neural Network explained
text
Learnable Parameters in a Neural Network
What's going on everyone? In this episode, we're going to talk about learnable parameters within an artificial neural network.
Without further ado, let's get to it.
Our Goals
As it turns out, we've already talked a lot about learnable parameters in a neural network, but we haven't necessarily given the general topic a formal introduction.
In this episode, we'll start out by defining what a learnable parameter within a neural network is.
Then, we will see how the total number of learnable parameters within a network is calculated.
After we see how this is done, we'll illustrate the calculation using a simple neural network.
Learnable Parameters
Alright, what is a learnable parameter in an artificial neural network?
What are Learnable Parameters?
Let's go into deep thought and see if we could possibly infer what a learnable parameter is solely based on its name.
Whew, that was tough. If you happened to just reach enlightenment, then you now know that a learnable parameter is, well, a parameter that is learned by the network during training.
Ok, all jokes aside, really, a learnable parameter is just that.
During the training process, we've discussed how stochastic gradient descent, or SGD, works to learn and optimize the weights and biases in a neural network. These weights and biases are indeed learnable parameters.
In fact, any parameters within our model which are learned during training via SGD are considered learnable parameters.
It's useful to note that these parameters are also referred to as trainable parameters, since they're optimized during the training process.
Calculating the Number of Learnable Parameters
Alright, so we know what learnable parameters are. How can we calculate the number of these parameters within each layer, or even within the entire network?
To find this result, essentially we just count the number of parameters within each layer and then sum them up to get the total number of parameters within the full network.
What we need in order to calculate the number of parameters within an individual layer is:
- The number of inputs to that layer.
- The number of outputs from that layer.
- Whether or not the layer contains biases.
Note that we're talking about a fully connected network made up of standard dense layers. In another episode, we'll focus on how this is done for other networks, like CNNs.
Now, once we have the needed information, we multiply the input to a layer by the number of outputs from the layer. Another way to think about the outputs is by simply thinking about the number of nodes within the layer. The number of nodes is equal to the number of outputs.
Now, multiplying the inputs by the outputs is going to give us the number of weights coming in to that layer.
Then, we just need to understand whether or not the layer contains biases for each node. If it does, then we simply add to the weights we just calculated, the number of biases. The number of biases will be equal to the number of nodes in the layer.
This will give us the number of learnable parameters within a given dense layer. We then do this same calculation for the remaining layers in the network and then sum all the results together to get the total number of learnable parameters within the entire network.
Learnable Parameters Example Calculation
Let's now look at an example to show this calculation in action.
Suppose we have a fully connected network with three layers:
- Input layer
- Hidden layer
- Output layer
We'll assume the following network architecture:
Layer | Number of Nodes |
---|---|
Input | 2 |
Hidden | 3 |
Output | 2 |
Additionally, we're assuming our network contains biases. This means that there are bias terms within our hidden layer and our output layer.
Now, let's calculate the number of learnable parameters within each layer.
First things first, the input layer has no learnable parameters since the input layer is just made up of the input data, and the output from the layer is actually just going to be considered as input to the next layer.
Moving on, let's calculate the number of learnable parameters within the hidden layer.
Alright, we discussed earlier we first need the number of inputs to the layer. We have two inputs, which are the outputs from the two nodes in the input layer. Next, we need the number of outputs from this layer.
The number of outputs is the number of nodes. This means that we have three outputs. We multiply these two numbers together, which gives us six total weights.
Next, we add in our biases. The hidden layer has three nodes, which means it has three bias terms. Adding three to six, we see that this layer has nine total learnable parameters.
Moving on to the output layer, we do the same. How many inputs are coming in to the output layer? We have three coming from our three nodes in the hidden layer.
How many outputs are coming from the output layer? We have two, since that's the amount of nodes this layer has. How many biases? We have two, again since that's how many nodes we have in the layer.
Multiplying our input by our output, we have three times two, so that's six weights, plus two bias terms. That's eight learnable parameters for our output layer.
Adding eight to the nine parameters from our hidden layer, we see that the entire network contains seventeen total learnable parameters. During training, SGD will be learning and optimizing all seventeen of these weights.
Conclusion
Calculating the number of learnable parameters in a fully connected network is just some arithmetic! As mentioned earlier, we're going to explore how this number is derived for a convolutional neural network as well in a future episode.
The process is really similar, but we have to consider the items that a CNN has that our standard fully connected network doesn't, like the filters within a convolutional layer, for example. See you in the next one!
quiz
resources
updates
Committed by on