Neural Network Programming - Deep Learning with PyTorch

with deeplizard.

CNN Layers - PyTorch Deep Neural Network Architecture

December 5, 2018 by

Blog

PyTorch CNN Layer Parameters

Welcome back to this series on neural network programming with PyTorch. In this post, we are going to learn about the layers of our CNN by building an understanding of the parameters we used when constructing them.

ai cyborg

Without further ado, let's get to it!

Our CNN Layers

In the last post, we started building our CNN by extending the PyTorch neural network Module class and defining some layers as class attributes. We defined two convolutional layers and three linear layers by specifying them inside our constructor.

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)
        
    def forward(self, t):
        # implement the forward pass
        return t

Each of our layers extends PyTorch's neural network Module class. For each layer, there are two primary items encapsulated inside, a forward function definition and a weight tensor.

The weight tensor inside each layer contains the weight values that are updated as the network learns during the training process, and this is the reason we are specifying our layers as attributes inside our Network class.

PyTorch's neural network Module class keeps track of the weight tensors inside each layer. The code that does this tracking lives inside the nn.Module class, and since we are extending the neural network module class, we inherit this functionality automatically.

Remember, inheritance is one of those object oriented concepts that we talked about last time. All we have to do to take advantage of this functionality is assign our layers as attributes inside our network module, and the Module base class will see this and register the weights as learnable parameters of our network.

CNN Layer Parameters

Our goal in this post is to better understand the layers we have defined. To do this, we're going to learn about the parameters and the values that we passed for these parameters in the layer constructors.

Parameter vs Argument

First, let's clear up some lingo that pertains to parameters in general. We often hear the words parameter and argument, but what's the difference between these two?

We'll parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.

In our network's case, the names are the parameters and the values that we have specified are the arguments.

Two types of parameters

To better understand the argument values for these parameters, let's consider two categories or types of parameters that we used when constructing our layers.

  1. Hyperparameters
  2. Data dependent hyperparameters

A lot of terms in deep learning are used loosely, and the word parameter is one of them. Try not to let it through you off. The main thing to remember about any type of parameter is that the parameter is a place-holder that will eventually hold or have a value.

The goal of these particular categories is to help us remember how each parameter's value is decided.

When we construct a layer, we pass values for each parameter to the layer’s constructor. With our convolutional layers have three parameters and the linear layers have two parameters.

  • Convolutional layers
    • in_channels
    • out_channels
    • kernel_size
  • Linear layers
    • in_features
    • out_features

Let's see how the values for the parameters are decided. We'll start by looking at hyperparameters, and then, we'll see how the dependent hyperparameters fall into place.

Hyperparameters

In general, hyperparameters are parameters whose values are chosen manually and arbitrarily.

As neural network programmers, we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past. For building our CNN layers, these are the parameters we choose manually.

  • kernel_size
  • out_channels
  • out_features

This means we simply choose the values for these parameters. In neural network programming, this is pretty common, and we usually test and tune these parameters to find values that work best.

Parameter Description
kernel_size Sets the filter size. The words kernel and filter are interchangeable.
out_channels Sets the number of filters. One filter produces one output channel.
out_features Sets the size of the output tensor.

On pattern that shows up quite often is that we increase our out_channels as we add additional conv layers, and after we switch to linear layers we shrink our out_features as we filter down to our number of output classes.

All of these parameters impact our network's architecture. Specifically, these parameters directly impact the weight tensors inside the layers. We'll dive deeper into this in the next post when we talk about learnable parameters and inspect the weight tensors, but for now, let's cover dependent hyperparameters.

Data dependent hyperparameters

Data dependent hyperparameters are parameters whose value depends on the data. The first two data dependent hyperparameters that stick out are the in_channels of the first convolutional layer, and the out_channels of the output layer.

You see, the in_channels of the first convolutional layer depend on the number of color channels present inside the images that make up the training set. Since we are dealing with grayscale images, we know that this value should be a 1.

gears

The out_features for the output layer depend on the number of classes that are present inside our training set. Since we have 10 classes of clothing inside the Fashion-MNIST dataset, we know that we need 10 output features.

In general, the input to one layer is the output from the previous layer, and so all of the in_channels in the conv layers and in_features in the linear layers depend on the data coming from the previous layer.

When we switch from a conv layer to a linear layer, we have to flatten our tensor. This is why we have 12*4*4. The twelve comes from the number of output channels in the previous layer, but why do we have the two 4s? We cover how we get these values in a future post.

Summary of layer parameters

We’ll learn more about the inner workings of our network and how our tensors flow through our network when we implement our forward() function. For now, be sure to check out this table that describes each of the parameters, to make sure you can understand how each parameter value is determined.

self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
Layer Param name Param value The param value is
conv1 in_channels 1 the number of color channels in the input image.
conv1 kernel_size 5 a hyperparameter.
conv1 out_channels 6 a hyperparameter.
conv2 in_channels 6 the number of out_channels in previous layer.
conv2 kernel_size 5 a hyperparameter.
conv2 out_channels 12 a hyperparameter (higher than previous conv layer).
fc1 in_features 12*4*4 the length of the flattened output from previous layer.
fc1 out_features 120 a hyperparameter.
fc2 in_features 120 the number of out_features of previous layer.
fc2 out_features 60 a hyperparameter (lower than previous linear layer).
out in_features 60 the number of out_channels in previous layer.
out out_features 10 the number of prediction classes.

Wrapping up

In the next post, we'll learn about learnable parameters, which are parameters whose values are learned during the training process. See you there!

Description

Understanding the layer parameters for convolutional and linear layers: nn.Conv2d(in_channels, out_channels, kernel_size) and nn.Linear(in_features, out_features) Check out the corresponding blog and other resources for this video at: http://deeplizard.com/learn/video/IKOHHItzukk ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: George Tohme Support collective intelligence, and join the deeplizard hivemind: http://deeplizard.com/hivemind Code: https://www.patreon.com/posts/code-for-pytorch-21607032 Follow deeplizard: YouTube: https://www.youtube.com/deeplizard Twitter: https://twitter.com/deeplizard Facebook: https://www.facebook.com/Deeplizard-145413762948316 Steemit: https://steemit.com/@deeplizard Instagram: https://www.instagram.com/deeplizard/ Pinterest: https://www.pinterest.com/deeplizard/ Check out products deeplizard suggests on Amazon: https://www.amazon.com/shop/deeplizard Get a free Audible 30-day trial and 2 free audio books with deeplizard’s link: https://amzn.to/2yoqWRn Recommended books on AI: The Most Human Human: What Artificial Intelligence Teaches Us About Being Alive: http://amzn.to/2GtjKqu Life 3.0: Being Human in the Age of Artificial Intelligence https://amzn.to/2H5Iau4 Playlists: Data Science - https://www.youtube.com/playlist?list=PLZbbT5o_s2xrth-Cqs_R9-us6IWk9x27z Machine Learning - https://www.youtube.com/playlist?list=PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras - https://www.youtube.com/playlist?list=PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js - https://www.youtube.com/playlist?list=PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- PyTorch - https://www.youtube.com/watch?v=v5cngxo4mIg&list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG Reinforcement Learning - https://www.youtube.com/playlist?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv Music: Thinking Music by Kevin MacLeod Jarvic 8 by Kevin MacLeod YouTube: https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ Website: http://incompetech.com/ Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ #ai #pytorch #deeplearning