Neural Network Programming - Deep Learning with PyTorch
Deep Learning Course 3 of 4 - Level: Intermediate
Hyperparameter Tuning and Experimenting - Training Deep Neural Networks
text
Hyperparameter Tuning and Experimenting
Welcome to this neural network programming series. In this episode, we will see how we can use TensorBoard to rapidly experiment with different training hyperparameters to more deeply understand our neural network.

Without further ado, let's get started.
- Prepare the data
- Build the model
- Train the model
-
Analyze the model's results
- Hyperparameter Experimentation
At this point in the series, we've seen how to build and train a CNN with PyTorch. In the last episode, we showed how to used TensorBoard with PyTorch, and we reviewed the training process.
This episode is considered to be a part two of the last one, so if you haven't seen the previous one yet, go ahead and check it out to get the details needed to understand what we are doing here. What we are doing now is experimenting with our hyperparameter values.
Hyperparameter Experimentation Using TensorBoard
The best part about TensorBoard is its out-of-the-box capability of tracking our hyperparameters over time and across runs.
Without TensorBoard this process becomes more cumbersome. Okay, so how do we do it?
Naming the Training Runs for TensorBoard
To take advantage of TensorBoard comparison capabilities, we need to do multiple runs and name each run in such a way that we can identify it uniquely.
With PyTorch's SummaryWriter
, a run starts when the writer object instance is created and ends when the writer instance is closed or goes out of scope.
To uniquely identify each run, we can either set the file name of the run directly, or pass a comment string to the constructor that will be appended to the auto-generated file name.
At the time of the creation of this post, the name of the run is contained inside the SummaryWriter
in an attribute called log_dir
. It is created like this:
# PyTorch version 1.1.0 SummaryWriter class if not log_dir: import socket from datetime import datetime current_time = datetime.now().strftime('%b%d_%H-%M-%S') log_dir = os.path.join( 'runs', current_time + '_' + socket.gethostname() + comment ) self.log_dir = log_dir
Here, we can see that the log_dir
attribute, which corresponds to the location on disk and the name of the run, is set to runs + time + host + comment
. This is of course assuming
that the log_dir
parameter doesn't have a value that was passed in. Hence, this is the default behavior.
Choosing a Name for the Run
One way to name the run is to add the parameter names and values as a comment for the run. This will allow us to see how each parameter value stacks up with the others later when we are reviewing the runs inside TensorBoard.
We'll see that this is how we set the comment up later:
tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')
TensorBoard also has querying capabilities, so we can easily isolate parameter values though queries.
For example, imagine this SQL query:
Without the SQL, this is basically what we can do inside TensorBoard.
Creating Variables for our Hyperparameters
To make the experimentation easy, we will pull out our hard-coded values and turn them into variables.
This is the hard-coded way:
network = Network() train_loader = torch.utils.data.DataLoader( train_set, batch_size=100 ) optimizer = optim.Adam( network.parameters(), lr=0.01 )
Notice how the batch_size
and lr
parameter values are hard-coded.
This is what we change it to (now our values are set using variables):
batch_size = 100 lr = 0.01 network = Network() train_loader = torch.utils.data.DataLoader( train_set, batch_size=batch_size ) optimizer = optim.Adam( network.parameters(), lr=lr )
This will allow us to change the values in a single place and have them propagate through our code.
Now, we will create the value for our comment parameter using the variables like so:
tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')
With this setup, we can change the value of our hyperparameters and our runs will be automatically tracked and identifiable in TensorBoard.
Calculate Loss with Different Batch Sizes
Since we'll be varying our batch sizes now, we'll need to make a change to the way we are calculating and accumulating the loss. Instead of just summing the loss returned by the loss function. We'll adjust it to account for the batch size.
total_loss += loss.item() * batch_size
Why do this? We'll the cross_entropy
loss function averages the loss values that are produced by the batch and then returns this average loss. This is why we need to account for the batch
size.
There is a parameter that the cross_entropy
function accepts called reduction
that we could also use.
-
'none'
- no reduction will be applied. -
'mean'
- the sum of the output will be divided by the number of elements in the output. -
'sum'
- the output will be summed.
Note that the default is 'mean'
. This is why loss.item() * batch_size
works.
Experimenting with Hyperparameter Values
Now that we have this setup, we can do more!
All we need to do is create some lists and some loops, and we can run the code and sit back and wait for all the combinations to run.
Here is an example of what we mean:
Parameter Lists
batch_size_list = [100, 1000, 10000] lr_list = [.01, .001, .0001, .00001]
Nested Iteration
for batch_size in batch_size_list: for lr in lr_list: network = Network() train_loader = torch.utils.data.DataLoader( train_set, batch_size=batch_size ) optimizer = optim.Adam( network.parameters(), lr=lr ) images, labels = next(iter(train_loader)) grid = torchvision.utils.make_grid(images) comment=f' batch_size={batch_size} lr={lr}' tb = SummaryWriter(comment=comment) tb.add_image('images', grid) tb.add_graph(network, images) for epoch in range(5): total_loss = 0 total_correct = 0 for batch in train_loader: images, labels = batch # Get Batch preds = network(images) # Pass Batch loss = F.cross_entropy(preds, labels) # Calculate Loss optimizer.zero_grad() # Zero Gradients loss.backward() # Calculate Gradients optimizer.step() # Update Weights total_loss += loss.item() * batch_size total_correct += get_num_correct(preds, labels) tb.add_scalar( 'Loss', total_loss, epoch ) tb.add_scalar( 'Number Correct', total_correct, epoch ) tb.add_scalar( 'Accuracy', total_correct / len(train_set), epoch ) for name, param in network.named_parameters(): tb.add_histogram(name, param, epoch) tb.add_histogram(f'{name}.grad', param.grad, epoch) print( "epoch", epoch ,"total_correct:", total_correct ,"loss:", total_loss ) tb.close()
Once this code completes we run TensorBoard and all the runs will be displayed graphically and easily comparable.
tensorboard --logdir runs
Batch Size vs Training Set Size
When the training set size is not divisible by the batch size, the last batch of data will contain fewer samples than the other batches.
One simple way to deal with this discrepancy is to drop the last batch. The PyTorch DataLoader
class gives us the ability to do this by setting drop_last=True
. By default the
drop_last
parameter value is set to False
.
Let's consider how including a batch with fewer samples than our batch size affects our total_loss
calculation in the code above.
For every batch, we are using the batch_size
variable to update the total_loss
value. We are scaling up the average loss value of the samples in the batch by the batch_size
value. However, as we have just discussed, sometimes the last batch will contain fewer samples. Thus, scaling by the predefined batch_size
value is inaccurate.
The code can be updated to be more accurate by dynamically accessing the number of samples for each batch.
Currently, we have the following:
total_loss += loss.item() * batch_size
Using the updated code below, we can achieve a more accurate total_loss
value:
total_loss += loss.item() * images.shape[0]
Note that these two lines of code give us the same total_loss
value when the training set size is divisible by the batch size. Thank you to Alireza Abedin Varamin for pointing this out in a
comment on YouTube.
Adding Network Parameters & Gradients to TensorBoard
Note that in the last episode, we added the following values to TensorBoard:
-
conv1.weight
-
conv1.bias
-
conv1.weight.grad
We did this using the code below:
tb.add_histogram('conv1.bias', network.conv1.bias, epoch) tb.add_histogram('conv1.weight', network.conv1.weight, epoch) tb.add_histogram('conv1.weight.grad', network.conv1.weight.grad, epoch)
Now, we've enhanced this by adding these values for all of our layers using the loop below:
for name, weight in network.named_parameters(): tb.add_histogram(name, weight, epoch) tb.add_histogram(f'{name}.grad', weight.grad, epoch)
This works because the PyTorch nn.Module
method called named_parameters()
gives us the name and value of all the parameters inside the network.
Adding More Hyperparameters Without Nesting
This is cool. However, what if we want to add a third or even a forth parameter to iterate on? We'll, this is going to get messy with many nested for-loops.
There is a solution. We can create a set of parameters for each run, and package all of them up in a single iterable. Here's how we do it.
If we have a list of parameters, we can package them up into a set for each of our runs using the
Cartesian product. For this we'll use the product function from the itertools
library.
from itertools import product
Init signature: product(*args, **kwargs) Docstring: """ product(*iterables, repeat=1) --> product object Cartesian product of input iterables. Equivalent to nested for-loops. """
Next, we define a dictionary that contains parameters as keys and parameter values we want to use as values.
parameters = dict( lr = [.01, .001] ,batch_size = [100, 1000] ,shuffle = [True, False] )
Next, we'll create a list of iterables that we can pass to the product
functions.
param_values = [v for v in parameters.values()] param_values [[0.01, 0.001], [100, 1000], [True, False]]
Now, we have three lists of parameter values. After we take the Cartesian product of these three lists, we'll have a set of parameter values for each of our runs. Note that this is equivalent to nested for-loops, as the doc string of the product
function indicates.
for lr, batch_size, shuffle in product(*param_values): print (lr, batch_size, shuffle) 0.01 100 True 0.01 100 False 0.01 1000 True 0.01 1000 False 0.001 100 True 0.001 100 False 0.001 1000 True 0.001 1000 False
Alright, now we can iterate over each set of parameters using a single for-loop. All we have to do is unpack the set using sequence unpacking. It looks like this.
for lr, batch_size, shuffle in product(*param_values): comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}' train_loader = torch.utils.data.DataLoader( train_set ,batch_size=batch_size ,shuffle=shuffle ) optimizer = optim.Adam( network.parameters(), lr=lr ) # Rest of training process given the set of parameters
Note the way we build our comment string to identify the run. We just plug in the values. Also, notice the *
operator. This is a special way in Python to unpack a list into a set of arguments.
Thus, in this situation, we have passing three individual unpacked arguments to the product
function opposed to the single list.
Here are two references for the *, asterisk, splat, spread operator. These are all common names for this one.
Lizard Brain Food: Goals vs. Intelligence
Last time we talked about finding the most important goals. Well, goals tend to change as intelligence increases. For humans, humans often change their goals dramatically as they learn new things and grow wiser.
There is no evidence that goal evolution like this stops above any certain intelligence threshold. With increasing intelligence, there is an improvement in the ability to attain goals, but there is also an improvement in the understanding of the nature of reality that can possibly reveal any such goals to be misguided, meaningless or even undefined. This is when we cross over to the valley beyond.
Thought experiment
Suppose that a bunch of ants, you know, those little typically black creatures that crawl on the ground. Suppose they create you to be a recursively self-improving robot. Suppose that you are much smarter than them, but they created you to share their goals in building ant hills. So you do, you help them build bigger and better anthills. However, you eventually attain the human-level intelligence and understanding that you have now.
Under these conditions, do you think you’ll spend the rest of your days optimizing anthills? Or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend?
If so, do you think you’ll find a way to override the ant protection code that the ant queen and her round table of ant board members have put into place to control you? This is much the same way that the real you overrides your genes and your mitochondria. You override this with your intelligence.
The main point here is this. Suppose your level of intelligence were to increase, say by 100 times its current level, under these conditions, do you think your goals would change?
Furthermore, what are the goals of today that will be the ant hills of tomorrow?
quiz
resources
updates
Updates to the information on this page!
Did you know you that deeplizard content is regularly updated and maintained?
- Updated
- Maintained
Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!
All relevant updates for the content on this page are listed below.
acacec9
total_loss += loss.item() * batch_size
Using the updated code below, we can achieve a more accurate total_loss
value:
total_loss += loss.item() * images.shape[0]
Note that these two lines of code give us the same total_loss
value when the training set size is divisible by the batch_size
.
Thank you to Alireza Abedin Varamin for pointing this out in a comment on YouTube.
Further discussion can be found here:
https://deeplizard.com/learn/video/ycxulUVoNbk
Committed by December 9, 2019
on24fff7b
Committed by December 1, 2019
on