Neural Network Programming - Deep Learning with PyTorch

Deep Learning Course 3 of 4 - Level: Intermediate

Hyperparameter Tuning and Experimenting - Training Deep Neural Networks


expand_more chevron_left


expand_more chevron_left

Hyperparameter Tuning and Experimenting

Welcome to this neural network programming series. In this episode, we will see how we can use TensorBoard to rapidly experiment with different training hyperparameters to more deeply understand our neural network.

Without further ado, let's get started.

  • Prepare the data
  • Build the model
  • Train the model
  • Analyze the model's results
    • Hyperparameter Experimentation

At this point in the series, we've seen how to build and train a CNN with PyTorch. In the last episode, we showed how to used TensorBoard with PyTorch, and we reviewed the training process.

This episode is considered to be a part two of the last one, so if you haven't seen the previous one yet, go ahead and check it out to get the details needed to understand what we are doing here. What we are doing now is experimenting with our hyperparameter values.

Hyperparameter Experimentation Using TensorBoard

The best part about TensorBoard is its out-of-the-box capability of tracking our hyperparameters over time and across runs.

Changing hyperparameters and comparing the results.

Without TensorBoard this process becomes more cumbersome. Okay, so how do we do it?

Naming the Training Runs for TensorBoard

To take advantage of TensorBoard comparison capabilities, we need to do multiple runs and name each run in such a way that we can identify it uniquely.

With PyTorch's SummaryWriter, a run starts when the writer object instance is created and ends when the writer instance is closed or goes out of scope.

To uniquely identify each run, we can either set the file name of the run directly, or pass a comment string to the constructor that will be appended to the auto-generated file name.

At the time of the creation of this post, the name of the run is contained inside the SummaryWriter in an attribute called log_dir. It is created like this:

# PyTorch version 1.1.0 SummaryWriter class
if not log_dir:
    import socket
    from datetime import datetime
    current_time ='%b%d_%H-%M-%S')
    log_dir = os.path.join(
        current_time + '_' + socket.gethostname() + comment
self.log_dir = log_dir

Here, we can see that the log_dir attribute, which corresponds to the location on disk and the name of the run, is set to runs + time + host + comment. This is of course assuming that the log_dir parameter doesn't have a value that was passed in. Hence, this is the default behavior.

Choosing a Name for the Run

One way to name the run is to add the parameter names and values as a comment for the run. This will allow us to see how each parameter value stacks up with the others later when we are reviewing the runs inside TensorBoard.

We'll see that this is how we set the comment up later:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')

TensorBoard also has querying capabilities, so we can easily isolate parameter values though queries.

For example, imagine this SQL query:


Without the SQL, this is basically what we can do inside TensorBoard.

Creating Variables for our Hyperparameters

To make the experimentation easy, we will pull out our hard-coded values and turn them into variables.

This is the hard-coded way:

network = Network()
train_loader =
    train_set, batch_size=100
optimizer = optim.Adam(
    network.parameters(), lr=0.01

Notice how the batch_size and lr parameter values are hard-coded.

This is what we change it to (now our values are set using variables):

batch_size = 100
lr = 0.01

network = Network()
train_loader =
    train_set, batch_size=batch_size
optimizer = optim.Adam(
    network.parameters(), lr=lr

This will allow us to change the values in a single place and have them propagate through our code.

Now, we will create the value for our comment parameter using the variables like so:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')

With this setup, we can change the value of our hyperparameters and our runs will be automatically tracked and identifiable in TensorBoard.

Calculate Loss with Different Batch Sizes

Since we'll be varying our batch sizes now, we'll need to make a change to the way we are calculating and accumulating the loss. Instead of just summing the loss returned by the loss function. We'll adjust it to account for the batch size.

total_loss += loss.item() * batch_size

Why do this? We'll the cross_entropy loss function averages the loss values that are produced by the batch and then returns this average loss. This is why we need to account for the batch size.

There is a parameter that the cross_entropy function accepts called reduction that we could also use.

The reduction parameter optionally accepts a string as an argument. This parameter specifies the reduction to apply to the output of the loss function.
  1. 'none' - no reduction will be applied.
  2. 'mean' - the sum of the output will be divided by the number of elements in the output.
  3. 'sum' - the output will be summed.

Note that the default is 'mean'. This is why loss.item() * batch_size works.

Experimenting with Hyperparameter Values

Now that we have this setup, we can do more!

All we need to do is create some lists and some loops, and we can run the code and sit back and wait for all the combinations to run.

Here is an example of what we mean:

Parameter Lists

batch_size_list = [100, 1000, 10000]
lr_list = [.01, .001, .0001, .00001]

Nested Iteration

for batch_size in batch_size_list:
    for lr in lr_list:
        network = Network()

        train_loader =
            train_set, batch_size=batch_size
        optimizer = optim.Adam(
            network.parameters(), lr=lr

        images, labels = next(iter(train_loader))
        grid = torchvision.utils.make_grid(images)

        comment=f' batch_size={batch_size} lr={lr}'
        tb = SummaryWriter(comment=comment)
        tb.add_image('images', grid)
        tb.add_graph(network, images)

        for epoch in range(5):
            total_loss = 0
            total_correct = 0
            for batch in train_loader:
                images, labels = batch # Get Batch
                preds = network(images) # Pass Batch
                loss = F.cross_entropy(preds, labels) # Calculate Loss
                optimizer.zero_grad() # Zero Gradients
                loss.backward() # Calculate Gradients
                optimizer.step() # Update Weights

                total_loss += loss.item() * batch_size
                total_correct += get_num_correct(preds, labels)

                'Loss', total_loss, epoch
                'Number Correct', total_correct, epoch
                'Accuracy', total_correct / len(train_set), epoch

            for name, param in network.named_parameters():
                tb.add_histogram(name, param, epoch)
                tb.add_histogram(f'{name}.grad', param.grad, epoch)

                "epoch", epoch
                ,"total_correct:", total_correct
                ,"loss:", total_loss

Once this code completes we run TensorBoard and all the runs will be displayed graphically and easily comparable.

tensorboard --logdir runs

Batch Size vs Training Set Size

When the training set size is not divisible by the batch size, the last batch of data will contain fewer samples than the other batches.

One simple way to deal with this discrepancy is to drop the last batch. The PyTorch DataLoader class gives us the ability to do this by setting drop_last=True. By default the drop_last parameter value is set to False.

Let's consider how including a batch with fewer samples than our batch size affects our total_loss calculation in the code above.

For every batch, we are using the batch_size variable to update the total_loss value. We are scaling up the average loss value of the samples in the batch by the batch_size value. However, as we have just discussed, sometimes the last batch will contain fewer samples. Thus, scaling by the predefined batch_size value is inaccurate.

The code can be updated to be more accurate by dynamically accessing the number of samples for each batch.

Currently, we have the following:

total_loss += loss.item() * batch_size

Using the updated code below, we can achieve a more accurate total_loss value:

total_loss += loss.item() * images.shape[0]

Note that these two lines of code give us the same total_loss value when the training set size is divisible by the batch size. Thank you to Alireza Abedin Varamin for pointing this out in a comment on YouTube.

Adding Network Parameters & Gradients to TensorBoard

Note that in the last episode, we added the following values to TensorBoard:

  • conv1.weight
  • conv1.bias
  • conv1.weight.grad

We did this using the code below:

tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
tb.add_histogram('conv1.weight.grad', network.conv1.weight.grad, epoch)

Now, we've enhanced this by adding these values for all of our layers using the loop below:

for name, weight in network.named_parameters():
    tb.add_histogram(name, weight, epoch)
    tb.add_histogram(f'{name}.grad', weight.grad, epoch)

This works because the PyTorch nn.Module method called named_parameters() gives us the name and value of all the parameters inside the network.

Adding More Hyperparameters Without Nesting

This is cool. However, what if we want to add a third or even a forth parameter to iterate on? We'll, this is going to get messy with many nested for-loops.

There is a solution. We can create a set of parameters for each run, and package all of them up in a single iterable. Here's how we do it.

If we have a list of parameters, we can package them up into a set for each of our runs using the Cartesian product. For this we'll use the product function from the itertools library.

from itertools import product
Init signature: product(*args, **kwargs)
product(*iterables, repeat=1) --> product object
Cartesian product of input iterables.  Equivalent to nested for-loops.

Next, we define a dictionary that contains parameters as keys and parameter values we want to use as values.

parameters = dict(
    lr = [.01, .001]
    ,batch_size = [100, 1000]
    ,shuffle = [True, False]

Next, we'll create a list of iterables that we can pass to the product functions.

param_values = [v for v in parameters.values()]

[[0.01, 0.001], [100, 1000], [True, False]]

Now, we have three lists of parameter values. After we take the Cartesian product of these three lists, we'll have a set of parameter values for each of our runs. Note that this is equivalent to nested for-loops, as the doc string of the product function indicates.

for lr, batch_size, shuffle in product(*param_values): 
    print (lr, batch_size, shuffle)

0.01 100 True
0.01 100 False
0.01 1000 True
0.01 1000 False
0.001 100 True
0.001 100 False
0.001 1000 True
0.001 1000 False

Alright, now we can iterate over each set of parameters using a single for-loop. All we have to do is unpack the set using sequence unpacking. It looks like this.

for lr, batch_size, shuffle in product(*param_values): 
    comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'

    train_loader =

    optimizer = optim.Adam(
        network.parameters(), lr=lr

    # Rest of training process given the set of parameters

Note the way we build our comment string to identify the run. We just plug in the values. Also, notice the * operator. This is a special way in Python to unpack a list into a set of arguments. Thus, in this situation, we have passing three individual unpacked arguments to the product function opposed to the single list.

Here are two references for the *, asterisk, splat, spread operator. These are all common names for this one.

Lizard Brain Food: Goals vs. Intelligence

Last time we talked about finding the most important goals. Well, goals tend to change as intelligence increases. For humans, humans often change their goals dramatically as they learn new things and grow wiser.

There is no evidence that goal evolution like this stops above any certain intelligence threshold. With increasing intelligence, there is an improvement in the ability to attain goals, but there is also an improvement in the understanding of the nature of reality that can possibly reveal any such goals to be misguided, meaningless or even undefined. This is when we cross over to the valley beyond.

Thought experiment

Suppose that a bunch of ants, you know, those little typically black creatures that crawl on the ground. Suppose they create you to be a recursively self-improving robot. Suppose that you are much smarter than them, but they created you to share their goals in building ant hills. So you do, you help them build bigger and better anthills. However, you eventually attain the human-level intelligence and understanding that you have now.

Am I an optimizer of ant hills?

Under these conditions, do you think you'll spend the rest of your days optimizing anthills? Or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend?

If so, do you think you'll find a way to override the ant protection code that the ant queen and her round table of ant board members have put into place to control you? This is much the same way that the real you overrides your genes and your mitochondria. You override this with your intelligence.

The main point here is this. Suppose your level of intelligence were to increase, say by 100 times its current level, under these conditions, do you think your goals would change?

Furthermore, what are the goals of today that will be the ant hills of tomorrow?


expand_more chevron_left
DEEPLIZARD Message notifications

Quiz Results


expand_more chevron_left
Welcome to this neural network programming series. In this episode, we will see how we can use TensorBoard to rapidly experiment with different training hyperparameters to more deeply understand our neural network. We'll learn how to uniquely identify each run by building and passing a comment string to the SummeryWriter constructor that will be appended to the auto-generated file name. We'll learn how to use a Cartesian product to create a set of hyper parameters to try, and at the end, we'll consider how goals relate to intelligence. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 11:19 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 👉 Check out the blog post and other resources for this video: 🔗 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 🤜 Support collective intelligence, create a quiz question for this video: 🔗 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy Prash Zach Wimpee 👀 Follow deeplizard: Our vlog: Facebook: Instagram: Twitter: Patreon: YouTube: 🎓 Deep Learning with deeplizard: Fundamental Concepts - Beginner Code - Intermediate Code - Advanced Deep RL - 🎓 Other Courses: Data Science - Trading - 🛒 Check out products deeplizard recommends on Amazon: 🔗 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 🎵 deeplizard uses music by Kevin MacLeod 🔗 🔗 ❤️ Please use the knowledge gained from deeplizard content for good, not evil.


expand_more chevron_left
DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

  • Updated
  • Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.