# Neural Network Programming - Deep Learning with PyTorch

with deeplizard.

## CNN Output Size Formula - Bonus Neural Network Debugging Session

May 29, 2019 by Blog

### CNN Output Size Formula - Tensor Transformations

Welcome to this neural network programming series with PyTorch. In this episode, we are going to see how an input tensor is transformed as it flows through a CNN. Without further ado, let's get started.

#### High-level Overview of Our Process

• Prepare the data
• Build the model
• Understanding forward pass transformations
• Train the model
• Analyze the model’s results

#### Overview of Our Network

The CNN we will use is the one that we have been working with over the last few posts that has six layers.

1. Input layer
2. Hidden conv layer
3. Hidden conv layer
4. Hidden linear layer
5. Hidden linear layer
6. Output layer

We built this network using PyTorch’s nn.Module class, and the Network class definition is as follows:

class Network(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)

self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)

def forward(self, t):
# (1) input layer
t = t

# (2) hidden conv layer
t = self.conv1(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)

# (3) hidden conv layer
t = self.conv2(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)

# (4) hidden linear layer
t = t.reshape(-1, 12 * 4 * 4)
t = self.fc1(t)
t = F.relu(t)

# (5) hidden linear layer
t = self.fc2(t)
t = F.relu(t)

# (6) output layer
t = self.out(t)
#t = F.softmax(t, dim=1)

return t


#### Passing a batch of size one (a single image)

In a previous episode, we saw how we can pass a single image by adding a batch dimension using PyTorch’s unsqueeze() method. We’ll pass this tensor to the network again, but this time we will step through the forward() method using the debugger. This will allow us to inspect our tensor as transformations are performed.

Let’s begin:

> network = Network()
> network(image.unsqueeze(0))


### #1 Input layer

When the tensor comes into the input layer, we have:

> t.shape
torch.Size([1, 1, 28, 28])


This value in each of these dimensions represent the following values:

(batch size, color channels, height, width)

Since the input layer is just the identity function, the output shape doesn’t change.

The input layer can be regarded as the trivial identity function, output of the layer is equal to the input.

### #2 Convolutional layer (1)

When the tensor comes into this layer, we have:

> t.shape
torch.Size([1, 1, 28, 28])


After the first convolution operation self.conv1, we have:

> t.shape
torch.Size([1, 6, 24, 24])


The batch size is still 1. This makes sense because we wouldn’t expect our batch size to change, and this is going to be the case through the entire forward pass.

The batch_size is fixed as we move through the forward pass.

The number of color channels has increased from 1 to 6. After we move forward beyond the first convolutional layer, we don’t think of the channels as color channels any longer. We just think of them as output channels. The reason we have 6 output channels is due to the number of out_channels that we specified when self.conv1 was created.

#### Convolution Operations use Filters

Like we have seen, this number 6 is arbitrary. The out_channels parameter instructs the nn.Conv2d layer class generate six filters, also known as kernels, with shape 5 by 5 with randomly initialized values. These filters are used to generate the six output channels.

The out_channels parameter determines how many filters will be created.

The filters are tensors, and they are used to convolve the input tensor when the tensor is passed to the layer instance, self.conv1. The random values inside the filter tensors are the weights of the convolutional layer. Remember though, we don't actually have six distinct tensor. All six of the filters are packaged into a single weight tensor that has a height and width of five. The filters are the weight tensors.

After the weight tensors (filters) are used to convolve the input tensor, the result is the output channels.

Another way to refer to the output channels is to call the feature maps. This is due to the fact that the pattern detection that emerges as the weights are updated represent features like edges and other more sophisticated patterns.

The algorithm:

1. Color channels are passed in.
2. Convolutions are performed using the weight tensor (filters).
3. Feature maps are produced and passed forward.

Conceptually, we can think of the weight tensors as being distinct. However, what we really have in code is a single weight tensor that has an out_channels (filters) dimension. We can see this by checking the shape of the weight tensor:

> self.conv1.weight.shape
torch.Size([6, 1, 5, 5])


This tensor’s shape is given by:

(number of filters, number of input channels, filter height, filter width)

#### The relu() activation function

The call to the relu() function removes any negative values and replaces them with zeros. We can verify this by checking the min() of the tensor before and after the call.

> t.min().item()
-1.1849982738494873

> t = F.relu(t)
> t.min().item()
0.0


The relu() function can be expressed mathematically as

$f\left(x\right) = \left\{ \begin{array}{lll} 0 & \text{if} & x \lt 0 & \\ x & \text{if} & x \geq 0 \end{array} \right.$

#### The max pooling operation

The pooling operation reduces the shape of our tensor further by extracting the maximum value from each 2x2 location within our tensor.

> t.shape
torch.Size([1, 6, 24, 24])

> t = F.max_pool2d(t, kernel_size=2, stride=2)
> t.shape
torch.Size([1, 6, 12, 12])


#### Convolution layer summary

The shapes of the tensor input to and output from the convolutional layer is given by:

• Input shape: [1, 1, 28, 28]
• Output shape: [1, 6, 12, 12]

Summary of each operation that occurs:

1. The convolution layer convolves the input tensor using six randomly initialized 5x5 filters.
• This reduces the height and width dimensions by four.
2. The relu activation function operation maps all negative values to 0.
• This means that all the values in the tensor are now positive.
3. The max pooling operation extracts the max value from each 2x2 section of the six feature maps that were created by the convolutions.
• This reduced the height and width dimensions by twelve.

### CNN Output Size Formula

Let's have a look at the formula for computing the output size of the tensor after performing convolutional and pooling operations.

#### CNN Output Size Formula (Square)

• Suppose we have an $$n \times n$$ input.
• Suppose we have an $$f \times f$$ filter.
• Suppose we have a padding of $$p$$ and a stride of $$s$$.

The output size $$O$$ is given by this formula:

$O = \frac{n - f + 2p}{s} + 1$

This value will be the height and width of the output. However, if the input or the filter isn't a square, this formula needs to be applied twice, once for the width and once for the height.

#### CNN Output Size Formula (Non-Square)

• Suppose we have an $$n_{h} \times n_{w}$$ input.
• Suppose we have an $$f_{h} \times f_{w}$$ filter.
• Suppose we have a padding of $$p$$ and a stride of $$s$$.

The height of the output size $$O_{h}$$ is given by this formula:

$O_{h} = \frac{n_{h} - f_{h} + 2p}{s} + 1$

The width of the output size $$O_{w}$$ is given by this formula:

$O_{w} = \frac{n_{w} - f_{w} + 2p}{s} + 1$

### #3 Convolutional layer (2)

The second hidden convolutional layer self.conv2, transforms the tensor in the same was as self.conv1 and reduces the height and width dimensions further. Before we run through these transformations, let’s check the shape of the weight tensor for self.conv2:

self.conv2.weight.shape
torch.Size([12, 6, 5, 5])


This time our weight tensor has twelve filters of height of five and width of five, but instead of having a single input channel, the number of channels is is coming in at six, which gives the filters a depth. This accounts for the six output channels from the first convolutional layer. The resulting output will have twelve channels.

Let’s run these operations now.

> t.shape
torch.Size([1, 6, 12, 12])

> t = self.conv2(t)
> t.shape
torch.Size([1, 12, 8, 8])

> t.min().item()
-0.39324113726615906

> t = F.relu(t)
> t.min().item()
0.0

> t = F.max_pool2d(t, kernel_size=2, stride=2)
> t.shape
torch.Size([1, 12, 4, 4])


The shape of the resulting output of self.conv2 allows us to see why we reshape the tensor using 12*4*4 before passing the tensor to the first linear layer, self.fc1.

As we have seen in the past, this particular reshaping is called flattening the tensor. The flatten operation puts all of the tensor’s elements into a single dimension. output

> t = t.reshape(-1, 12*4*4)
> t.shape
torch.Size([1, 192])


The resulting shape is 1x192. The 1 in this case represents the batch size, and the 192 is the number of elements in the tensor that are now in the same dimension.

### #4 #5 #6 Linear Layers

Now, we just have a series of linear layers followed by non-linear activation function until we reach the output layer.

> t = self.fc1(t)

> t.shape
torch.Size([1, 120])

> t = self.fc2(t)
> t.shape
torch.Size([1, 60])

> t = self.out(t)
> t.shape
torch.Size([1, 10])

> t
tensor([[ 0.1009, -0.0842,  0.0349, -0.0640,  0.0754, -0.0057,  0.0878,  0.0296,  0.0345,  0.0236]])


This table summarizes the shape changing operations and the resulting shape of each:

Operation Output Shape
Identity function torch.Size([1, 1, 28, 28])
Convolution (5 x 5) torch.Size([1, 6, 24, 24])
Max pooling (2 x 2) torch.Size([1, 6, 12, 12])
Convolution (5 x 5) torch.Size([1, 12, 8, 8])
Max pooling (2 x 2) torch.Size([1, 12, 4, 4])
Flatten (reshape) torch.Size([1, 192])
Linear transformation torch.Size([1, 120])
Linear transformation torch.Size([1, 60])
Linear transformation torch.Size([1, 10])

### Training the CNN is next

We should now have a good understanding of how input tensors are transformed by convolutional neural networks, how to debug neural networks in PyTorch, and how to inspect the weight tensors of all of the layers. In the next episode, we will begin training our network, which will lead to the values in our weight tensor to be updated to make the forward method of our network map the inputs to the correct output classes. I’ll see you in the next one!

Description