CNN Forward Pass Implementation
Welcome to this series on neural network programming with PyTorch. In this one, we'll show how to implement the forward method for a convolutional neural network in PyTorch.
Without further ado, let's get started.
Neural Network Programming Series (Recap)
So far in this series, we've prepared our data, and we're now in the process of building our model.
We created our network by extending the nn.Module
PyTorch base class, and then, in the class constructor, we defined the network's layers as class attributes. Now, we need to implement
our network's forward()
method, and then, finally, we'll be ready to train our model.
 Prepare the data

Build the model

Create a neural network class that extends the
nn.Module
base class.  In the class constructor, define the network's layers as class attributes.

Use the network's layer attributes as well
nn.functional
API operations to define the network's forward pass.

Create a neural network class that extends the
 Train the model
 Analyze the model's results
Reviewing the Network
At the moment, we know that our forward()
method accepts a tensor as input, and then, returns a tensor as output. Right now, the tensor that is returned is the same tensor that is passed.
However, after we build out the implementation, the returned tensor will be the output of the network.
This means that the forward method implementation will use all of the layers we defined inside the constructor. In this way, the forward method explicitly defines the networks transformation.
The forward()
method is the actual network transformation. The forward method is the mapping that maps an input tensor to a prediction output tensor. Let's see how this is done.
Recall that in our network's constructor, we can see that we have five layers defined.
class Network(nn.Module): def __init__(self): super(Network, self).__init__() self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5) self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5) self.fc1 = nn.Linear(in_features=12 * 4 * 4, out_features=120) self.fc2 = nn.Linear(in_features=120, out_features=60) self.out = nn.Linear(in_features=60, out_features=10) def forward(self, t): # implement the forward pass return t
We have two convolutional layers and three Linear layers. If we count the input layer, this gives us a network with a total of six layers.
Implementing the forward()
method
Let's code this up. We'll kick things off with the input layer.
Input layer #1
The input layer of any neural network is determined by the input data. For example, if our input tensor contains three elements, our network would have three nodes contained in its input layer.
For this reason, we can think of the input layer as the identity transformation. Mathematically, this is the function, \[f(x)=x.\]
We give any \(x\) as the input, and we get back the same \(x\) as the output. This logic is the same regardless of whether we're working with a tensor that has three elements, or a tensor that represents an image with three channels. The in is the data out!
This is pretty trivial, and this is the reason we usually don't see the input layer when we are working with neural network APIs. The input layer exists implicitly.
It's definitely not required, but for the sake of completion, we'll show the identity operation in our forward method.
# (1) input layer t = t
Hidden convolutional layers: Layers #2 and #3
Both of the hidden convolutional layers are going to be very similar in terms of performing the transformation. In the deep learning fundamentals series, we explained in the post on layers that all layers that are not the input or output layers are called hidden layers, and this is why we are referring to these convolutional layers as hidden layers.
To preform the convolution operation, we pass the tensor to the forward method of the first convolutional layer, self.conv1
. We've learned how all PyTorch neural network modules have
forward()
methods, and when we call the forward()
method of a nn.Module
, there is a special way that we make the call.
When want to call the forward()
method of a nn.Module
instance, we call the actual instance instead of calling the forward()
method directly.
Instead of doing this self.conv1.forward(tensor)
, we do this self.conv1(tensor)
. Make sure you see the
previous post in this series to see all the details on this.
Let's go ahead and add all the calls needed to implement both of our convolutional layers.
# (2) hidden conv layer t = self.conv1(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2) # (3) hidden conv layer t = self.conv2(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2)
As we can see here, our input tensor is transformed as we move through the convolutional layers. The first convolutional layer has a convolutional operation, followed by a
relu activation operation whose output is then passed to a max pooling operation with kernel_size=2
and stride=2
.
The output tensor t
of the first convolutional layer is then passed to the next convolutional layer, which is identical except for the fact that we call self.conv2()
instead
of
self.conv1()
.
Each of these layers is comprised of a collection of weights (data) and a collection operations (code). The weights are encapsulated inside the nn.Conv2d()
class instance. The
relu()
and the
max_pool2d()
calls are just pure operations. Neither of these have weights, and this is why we call them directly from the nn.functional
API.
Sometimes we may see pooling operations referred to as pooling layers. Sometimes we may even hear activation operations called activation layers.
However, what makes a layer distinct from an operation is that layers have weights. Since pooling operations and activation functions do not have weights, we will refer to them as operations and view them as being added to the collection of layer operations.
For example, we'll say that the second layer in our network is a convolutional layer that contains a collection of weights, and preforms three operations, a convolution operation, the relu activation operation, and the max pooling operation.
Note that the rules and terminology here are not strict. This is just one way to describe a network. There are other ways to express these ideas. The main thing we need to be aware of is which operations are defined using weights and which ones don't use any weights.
Historically, the operations that are defined using weights are what we call layers. Later, other operations were added to the mix like activation functions and pooling operations, and this caused some confusion in terminology.
Mathematically, the entire network is just a composition of functions, and a composition of functions is a function itself. So a network is just a function. All the terms like layers, activation functions, and weights, are just used to help describe the different parts.
Don't let these terms confuse the fact that the whole network is simply a composition of functions, and what we are doing now is defining this composition inside our forward()
method.
Hidden linear layers: Layers #4 and #5
Before we pass our input to the first hidden linear layer, we must reshape()
or flatten our tensor. This will be the case any time we are passing output from a convolutional layer as input
to a linear layer.
Since the forth layer is the first linear layer, we will include our reshaping operation as a part of the forth layer.
# (4) hidden linear layer t = t.reshape(1, 12 * 4 * 4) t = self.fc1(t) t = F.relu(t) # (5) hidden linear layer t = self.fc2(t) t = F.relu(t)
We saw in the
post on CNN weights that the number 12
in the reshaping operation is determined by the number of output channels coming from the previous convolutional layer.
However, the 4 * 4
was left as an open question. Let's reveal the answer now. The 4 * 4
is actually the height and width of each of the 12
output channels.
We started with a 1 x 28 x 28
input tensor. This gives a single
color channel, 28 x 28
image, and by the time our tensor arrives at the first linear layer, the dimensions have changed.
The height and width dimensions have been reduced from 28 x 28
to 4 x 4
by the convolution and pooling operations.
Convolution and pooling operations are reduction operations on the height and width dimensions. We'll see how this works and see a formula for calculating these reductions in the
next post
. For now, let's finish implementing our this forward()
method.
After the tensor is reshaped, we pass the
flattened tensor to the linear layer and pass this result to the relu()
activation function.
Output layer #6
The sixth and last layer of our network is a linear layer we call the output layer. When we pass our tensor to the output layer, the result will be the prediction tensor. Since our data has ten prediction classes, we know our output tensor will have ten elements.
# (6) output layer t = self.out(t) #t = F.softmax(t, dim=1)
The values inside each of the ten components will correspond to the prediction value for each of our prediction classes.
Inside the network we usually use relu()
as our
nonlinear activation function, but for the output layer, whenever we have a single category that we are trying to predict, we use softmax()
. The
softmax function returns a positive probability for each of the prediction classes, and the probabilities sum to 1
.
However, in our case, we won't use softmax()
because the loss function that we'll use, F.cross_entropy()
, implicitly performs the softmax()
operation
on its input, so we'll just return the result of the last linear transformation.
The implication of this is that our network will be trained using the softmax operation but will not need to compute the additional operation when the network is used for inference after the training process is complete.
Conclusion
Great! We did it. This is how we implement a neural network forward method in PyTorch.
def forward(self, t): # (1) input layer t = t # (2) hidden conv layer t = self.conv1(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2) # (3) hidden conv layer t = self.conv2(t) t = F.relu(t) t = F.max_pool2d(t, kernel_size=2, stride=2) # (4) hidden linear layer t = t.reshape(1, 12 * 4 * 4) t = self.fc1(t) t = F.relu(t) # (5) hidden linear layer t = self.fc2(t) t = F.relu(t) # (6) output layer t = self.out(t) #t = F.softmax(t, dim=1) return t
I'll see you in the next one!