PyTorch - Python Deep Learning Neural Network API

Deep Learning Course - Level: Intermediate

CNN Layers - PyTorch Deep Neural Network Architecture

video

expand_more chevron_left

text

expand_more chevron_left

PyTorch CNN Layer Parameters

Welcome back to this series on neural network programming with PyTorch. In this post, we are going to learn about the layers of our CNN by building an understanding of the parameters we used when constructing them.

ai cyborg

Without further ado, let's get to it!

Our CNN Layers

In the last post, we started building our CNN by extending the PyTorch neural network Module class and defining some layers as class attributes. We defined two convolutional layers and three linear layers by specifying them inside our constructor.

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        # implement the forward pass
        return t

Each of our layers extends PyTorch's neural network Module class. For each layer, there are two primary items encapsulated inside, a forward function definition and a weight tensor.

The weight tensor inside each layer contains the weight values that are updated as the network learns during the training process, and this is the reason we are specifying our layers as attributes inside our Network class.

PyTorch's neural network Module class keeps track of the weight tensors inside each layer. The code that does this tracking lives inside the nn.Module class, and since we are extending the neural network module class, we inherit this functionality automatically.

Remember, inheritance is one of those object oriented concepts that we talked about last time. All we have to do to take advantage of this functionality is assign our layers as attributes inside our network module, and the Module base class will see this and register the weights as learnable parameters of our network.

CNN Layer Parameters

Our goal in this post is to better understand the layers we have defined. To do this, we're going to learn about the parameters and the values that we passed for these parameters in the layer constructors.

Parameter vs Argument

First, let's clear up some lingo that pertains to parameters in general. We often hear the words parameter and argument, but what's the difference between these two?

Parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.

In our network's case, the names are the parameters and the values that we have specified are the arguments.

Two types of parameters

To better understand the argument values for these parameters, let's consider two categories or types of parameters that we used when constructing our layers.

  1. Hyperparameters
  2. Data dependent hyperparameters

A lot of terms in deep learning are used loosely, and the word parameter is one of them. Try not to let it throw you off. The main thing to remember about any type of parameter is that the parameter is a place-holder that will eventually hold or have a value.

The goal of these particular categories is to help us remember how each parameter's value is decided.

When we construct a layer, we pass values for each parameter to the layer's constructor. With our convolutional layers have three parameters and the linear layers have two parameters.

  • Convolutional layers
    • in_channels
    • out_channels
    • kernel_size
  • Linear layers
    • in_features
    • out_features

Let's see how the values for the parameters are decided. We'll start by looking at hyperparameters, and then, we'll see how the dependent hyperparameters fall into place.

Hyperparameters

In general, hyperparameters are parameters whose values are chosen manually and arbitrarily.

As neural network programmers, we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past. For building our CNN layers, these are the parameters we choose manually.

  • kernel_size
  • out_channels
  • out_features

This means we simply choose the values for these parameters. In neural network programming, this is pretty common, and we usually test and tune these parameters to find values that work best.

Parameter Description
kernel_size Sets the height and width of the filter.
out_channels Sets depth of the filter. This is the number of kernels inside the filter. One kernel produces one output channel.
out_features Sets the size of the output tensor.

One pattern that shows up quite often is that we increase our out_channels as we add additional conv layers, and after we switch to linear layers we shrink our out_features as we filter down to our number of output classes.

All of these parameters impact our network's architecture. Specifically, these parameters directly impact the weight tensors inside the layers. We'll dive deeper into this in the next post when we talk about learnable parameters and inspect the weight tensors, but for now, let's cover dependent hyperparameters.

Data dependent hyperparameters

Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the in_channels of the first convolutional layer, and the out_features of the output layer.

You see, the in_channels of the first convolutional layer depend on the number of color channels present inside the images that make up the training set. Since we are dealing with grayscale images, we know that this value should be a 1.

gears

The out_features for the output layer depend on the number of classes that are present inside our training set. Since we have 10 classes of clothing inside the Fashion-MNIST dataset, we know that we need 10 output features.

In general, the input to one layer is the output from the previous layer, and so all of the in_channels in the conv layers and in_features in the linear layers depend on the data coming from the previous layer.

When we switch from a conv layer to a linear layer, we have to flatten our tensor. This is why we have 12*4*4. The twelve comes from the number of output channels in the previous layer, but why do we have the two 4s? We cover how we get these values in a future post.

Summary of layer parameters

We'll learn more about the inner workings of our network and how our tensors flow through our network when we implement our forward() function. For now, be sure to check out this table that describes each of the parameters, to make sure you can understand how each parameter value is determined.

self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
Layer Param name Param value The param value is
conv1 in_channels 1 the number of color channels in the input image.
conv1 kernel_size 5 a hyperparameter.
conv1 out_channels 6 a hyperparameter.
conv2 in_channels 6 the number of out_channels in previous layer.
conv2 kernel_size 5 a hyperparameter.
conv2 out_channels 12 a hyperparameter (higher than previous conv layer).
fc1 in_features 12*4*4 the length of the flattened output from previous layer.
fc1 out_features 120 a hyperparameter.
fc2 in_features 120 the number of out_features of previous layer.
fc2 out_features 60 a hyperparameter (lower than previous linear layer).
out in_features 60 the number of out_channels in previous layer.
out out_features 10 the number of prediction classes.

Kernel vs Filter

Note that, in deep learning, we often use the words filter and kernel interchangeably. However, there is a technical distinction between these two concepts.

A kernel is a 2D tensor, and a filter is a 3D tensor that contains a collection of kernels. We apply a kernel to a single channel, and we apply a filter to multiple channels. To learn more about this distinction, check out this stackexchange post.

Thank you to Thorwald from the community for pointing this out!

Wrapping up

In the next post, we'll learn about learnable parameters, which are parameters whose values are learned during the training process. See you there!

quiz

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Quiz Results

resources

expand_more chevron_left
Understanding the layer parameters for convolutional and linear layers: nn.Conv2d(in_channels, out_channels, kernel_size) and nn.Linear(in_features, out_features) πŸ•’πŸ¦Ž VIDEO SECTIONS πŸ¦ŽπŸ•’ 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 11:00 Collective Intelligence and the DEEPLIZARD HIVEMIND πŸ’₯🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎πŸ’₯ πŸ‘‹ Hey, we're Chris and Mandy, the creators of deeplizard! πŸ‘€ CHECK OUT OUR VLOG: πŸ”— https://youtube.com/deeplizardvlog πŸ’ͺ CHECK OUT OUR FITNESS CHANNEL: πŸ”— https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: πŸ”— https://neurohacker.com/shop?rfsn=6488344.d171c6 ❀️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime πŸ‘€ Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard πŸŽ“ Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd πŸŽ“ Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y πŸ›’ Check out products deeplizard recommends on Amazon: πŸ”— https://amazon.com/shop/deeplizard πŸ“• Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: πŸ”— https://amzn.to/2yoqWRn 🎡 deeplizard uses music by Kevin MacLeod πŸ”— https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❀️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

  • Updated
  • Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!


All relevant updates for the content on this page are listed below.