Neural Network Programming - Deep Learning with PyTorch

Deep Learning Course 3 of 4 - Level: Intermediate

CNN Training with Code Example - Neural Network Programming Course


expand_more chevron_left


expand_more chevron_left

CNN Training Process

Welcome to this neural network programming series with PyTorch. In this episode, we will learn the steps needed to train a convolutional neural network.

So far in this series, we learned about Tensors, and we've learned all about PyTorch neural networks. We are now ready to begin the training process.

  • Prepare the data
  • Build the model
  • Train the model
    • Calculate the loss, the gradient, and update the weights
  • Analyze the model's results

Training: What we do after the forward pass

During training, we do a forward pass, but then what? We'll suppose we get a batch and pass it forward through the network. Once the output is obtained, we compare the predicted output to the actual labels, and once we know how close the predicted values are from the actual labels, we tweak the weights inside the network in such a way that the values the network predicts move closer to the true values (labels).

All of this is for a single batch, and we repeat this process for every batch until we have covered every sample in our training set. After we've completed this process for all of the batches and passed over every sample in our training set, we say that an epoch is complete. We use the word epoch to represent a time period in which our entire training set has been covered.

During the entire training process, we do as many epochs as necessary to reach our desired level of accuracy. With this, we have the following steps:

  1. Get batch from the training set.
  2. Pass batch to network.
  3. Calculate the loss (difference between the predicted values and the true values).
  4. Calculate the gradient of the loss function w.r.t the network's weights.
  5. Update the weights using the gradients to reduce the loss.
  6. Repeat steps 1-5 until one epoch is completed.
  7. Repeat steps 1-6 for as many epochs required to reach the minimum loss.

We already know exactly how to do steps 1 and 2. If you've already covered the deep learning fundamentals series, then you know that we use a loss function to perform step 3, and you know that we use backpropagation and an optimization algorithm to perform step 4 and 5. Steps 6 and 7 are just standard Python loops (the training loop). Let's see how this is done in code.

The Training Process

Since we disabled PyTorch's gradient tracking feature in a previous episode, we need to be sure to turn it back on (it is on by default).

> torch.set_grad_enabled(True)
<torch.autograd.grad_mode.set_grad_enabled at 0x15b22d012b0>

Preparing for the Forward Pass

We already know how to get a batch and pass it forward through the network. Let's see what we do after the forward pass is complete.

We'll begin by:

  1. Creating an instance of our Network class.
  2. Creating a data loader that provides batches of size 100 from our training set.
  3. Unpacking the images and labels from one of these batches.
> network = Network()

> train_loader =, batch_size=100)
> batch = next(iter(train_loader)) # Getting a batch
> images, labels = batch

Next, we are ready to pass our batch of images forward through the network and obtain the output predictions. Once we have the prediction tensor, we can use the predictions and the true labels to calculate the loss.

Calculating the loss

To do this we will use the cross_entropy() loss function that is available in PyTorch's nn.functional API. Once we have the loss, we can print it, and also check the number of correct predictions using the function we created a previous post.

> preds = network(images)
> loss = F.cross_entropy(preds, labels) # Calculating the loss

> loss.item()

> get_num_correct(preds, labels)

The cross_entropy() function returned a scalar valued tenor, and so we used the item() method to print the loss as a Python number. We got 9 out of 100 correct, and since we have 10 prediction classes, this is what we'd expect by guessing at random.

Calculating the Gradients

Calculating the gradients is very easy using PyTorch. Since our network is a PyTorch nn.Module, PyTorch has created a computation graph under the hood. As our tensor flowed forward through our network, all of the computations where added to the graph. The computation graph is then used by PyTorch to calculate the gradients of the loss function with respect to the network's weights.

Before we calculate the gradients, let's verify that we currently have no gradients inside our conv1 layer. The gradients are tensors that are accessible in the grad (short for gradient) attribute of the weight tensor of each layer.

> network.conv1.weight.grad

To calculate the gradients, we call the backward() method on the loss tensor, like so:

loss.backward() # Calculating the gradients

Now, the gradients of the loss function have been stored inside weight tensors.

> network.conv1.weight.grad.shape
torch.Size([6, 1, 5, 5])

These gradients are used by the optimizer to update the respective weights. To create our optimizer, we use the torch.optim package that has many optimization algorithm implementations that we can use. We'll use Adam for our example.

Updating the Weights

To the Adam class constructor, we pass the network parameters (this is how the optimizer is able to access the gradients), and we pass the learning rate .

Finally, all we have to do to update the weights is to tell the optimizer to use the gradients to step in the direction of the loss function's minimum.

optimizer = optim.Adam(network.parameters(), lr=0.01)
optimizer.step() # Updating the weights

When the step() function is called, the optimizer updates the weights using the gradients that are stored in the network's parameters. This means that we should expect our loss to be reduced if we pass the same batch through the network again. Checking this, we can see that this is indeed the case:

> preds = network(images)
> loss.item()

> loss = F.cross_entropy(preds, labels)

> get_num_correct(preds, labels)

Train Using a Single Batch

We can summarize the code for training with a single batch in the following way:

network = Network()

train_loader =, batch_size=100)
optimizer = optim.Adam(network.parameters(), lr=0.01)

batch = next(iter(train_loader)) # Get Batch
images, labels = batch

preds = network(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss

loss.backward() # Calculate Gradients
optimizer.step() # Update Weights

print('loss1:', loss.item())
preds = network(images)
loss = F.cross_entropy(preds, labels)
print('loss2:', loss.item())


loss1: 2.3034827709198
loss2: 2.2825052738189697

Building the Training Loop is Next

We should now have a good understanding of the training process. In the next episode, we'll see how these ideas are extended by completing the process by constructing the training loop. See you in the next one!


expand_more chevron_left
DEEPLIZARD Message notifications

Quiz Results


expand_more chevron_left
In this episode, we discuss the training process in general and show how to train a CNN with PyTorch. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 17:48 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 👉 Check out the blog post and other resources for this video: 🔗 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 🤜 Support collective intelligence, create a quiz question for this video: 🔗 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy Prash Zach Wimpee 👀 Follow deeplizard: Our vlog: Facebook: Instagram: Twitter: Patreon: YouTube: 🎓 Deep Learning with deeplizard: Fundamental Concepts - Beginner Code - Intermediate Code - Advanced Deep RL - 🎓 Other Courses: Data Science - Trading - 🛒 Check out products deeplizard recommends on Amazon: 🔗 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 🎵 deeplizard uses music by Kevin MacLeod 🔗 🔗 ❤️ Please use the knowledge gained from deeplizard content for good, not evil.


expand_more chevron_left
DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

  • Updated
  • Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.