# Neural Network Programming - Deep Learning with PyTorch

with deeplizard.

## Hyperparameter Tuning and Experimenting - Training Deep Neural Networks

July 14, 2019 by

Blog

### Hyperparameter Tuning and Experimenting

Welcome to this neural network programming series. In this episode, we will see how we can use TensorBoard to rapidly experiment with different training hyperparameters to more deeply understand our neural network.

Without further ado, let's get started.

• Prepare the data
• Build the model
• Train the model
• Analyze the model's results
• Hyperparameter Experimentation

At this point in the series, we've seen how to build and train a CNN with PyTorch. In the last episode, we showed how to used TensorBoard with PyTorch, and we reviewed the training process.

This episode is considered to be a part two of the last one, so if you haven't seen the previous one yet, go ahead and check it out to get the details needed to understand what we are doing here. What we are doing now is experimenting with our hyperparameter values.

### Hyperparameter Experimentation Using TensorBoard

The best part about TensorBoard is its out-of-the-box capability of tracking our hyperparameters over time and across runs.

Changing hyperparameters and comparing the results.

Without TensorBoard this process becomes more cumbersome. Okay, so how do we do it?

### Naming the Training Runs for TensorBoard

To take advantage of TensorBoard comparison capabilities, we need to do multiple runs and name each run in such a way that we can identify it uniquely.

With PyTorch's SummaryWriter, a run starts when the writer object instance is created and ends when the writer instance is closed or goes out of scope.

To uniquely identify each run, we can either set the file name of the run directly, or pass a comment string to the constructor that will be appended to the auto-generated file name.

At the time of the creation of this post, the name of the run is contained inside the SummaryWriter in an attribute called log_dir. It is created like this:

# PyTorch version 1.1.0 SummaryWriter class
if not log_dir:
import socket
from datetime import datetime
current_time = datetime.now().strftime('%b%d_%H-%M-%S')
log_dir = os.path.join(
'runs',
current_time + '_' + socket.gethostname() + comment
)
self.log_dir = log_dir


Here, we can see that the log_dir attribute, which corresponds to the location on disk and the name of the run, is set to runs + time + host + comment. This is of course assuming that the log_dir parameter doesn't have a value that was passed in. Hence, this is the default behavior.

#### Choosing a Name for the Run

One way to name the run is to add the parameter names and values as a comment for the run. This will allow us to see how each parameter value stacks up with the others later when we are reviewing the runs inside TensorBoard.

We'll see that this is how we set the comment up later:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')


TensorBoard also has querying capabilities, so we can easily isolate parameter values though queries.

For example, imagine this SQL query:

SELECT * FROM TBL_RUNS WHERE lr = 0.01

Without the SQL, this is basically what we can do inside TensorBoard.

### Creating Variables for our Hyperparameters

To make the experimentation easy, we will pull out our hard-coded values and turn them into variables.

This is the hard-coded way:

network = Network()
train_set, batch_size=100
)
network.parameters(), lr=0.01
)


Notice how the batch_size and lr parameter values are hard-coded.

This is what we change it to (now our values are set using variables):

batch_size = 100
lr = 0.01

network = Network()
train_set, batch_size=batch_size
)
network.parameters(), lr=lr
)


This will allow us to change the values in a single place and have them propagate through our code.

Now, we will create the value for our comment parameter using the variables like so:

tb = SummaryWriter(comment=f' batch_size={batch_size} lr={lr}')


With this setup, we can change the value of our hyperparameters and our runs will be automatically tracked and identifiable in TensorBoard.

### Calculate Loss with Different Batch Sizes

Since we'll be varying our batch sizes now, we'll need to make a change to the way we are calculating and accumulating the loss. Instead of just summing the loss returned by the loss function. We'll adjust it to account for the batch size.

total_loss += loss.item() * batch_size


Why do this? We'll the cross_entropy loss function averages the loss values that are produced by the batch and then returns this average loss. This is why we need to account for the batch size.

There is a parameter that the cross_entropy function accepts called reduction that we could also use.

The reduction parameter optionally accepts a string as an argument. This parameter specifies the reduction to apply to the output of the loss function.
1. 'none' - no reduction will be applied.
2. 'mean' - the sum of the output will be divided by the number of elements in the output.
3. 'sum' - the output will be summed.

Note that the default is 'mean'. This is why loss.item() * batch_size works.

### Experimenting with Hyperparameter Values

Now that we have this setup, we can do more!

All we need to do is create some lists and some loops, and we can run the code and sit back and wait for all the combinations to run.

Here is an example of what we mean:

#### Parameter Lists

batch_size_list = [100, 1000, 10000]
lr_list = [.01, .001, .0001, .00001]


#### Nested Iteration

for batch_size in batch_size_list:
for lr in lr_list:
network = Network()

train_set, batch_size=batch_size
)
network.parameters(), lr=lr
)

grid = torchvision.utils.make_grid(images)

comment=f' batch_size={batch_size} lr={lr}'
tb = SummaryWriter(comment=comment)

for epoch in range(5):
total_loss = 0
total_correct = 0
images, labels = batch # Get Batch
preds = network(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss
optimizer.step() # Update Weights

total_loss += loss.item() * batch_size
total_correct += get_num_correct(preds, labels)

'Loss', total_loss, epoch
)
'Number Correct', total_correct, epoch
)
'Accuracy', total_correct / len(train_set), epoch
)

for name, param in network.named_parameters():

print(
"epoch", epoch
,"total_correct:", total_correct
,"loss:", total_loss
)
tb.close()


Once this code completes we run TensorBoard and all the runs will be displayed graphically and easily comparable.

tensorboard --logdir runs


Note that in the last episode, we added the following values to TensorBoard:

• conv1.weight
• conv1.bias
• conv1.weight.grad

We did this using the code below:

tb.add_histogram('conv1.bias', network.conv1.bias, epoch)


Now, we've enhanced this by adding these values for all of our layers using the loop below:

for name, weight in network.named_parameters():


This works because the PyTorch nn.Module method called named_parameters() gives us the name and value of all the parameters inside the network.

### Adding More Hyperparameters Without Nesting

This is cool. However, what if we want to add a third or even a forth parameter to iterate on? We'll, this is going to get messy with many nested for-loops.

There is a solution. We can create a set of parameters for each run, and package all of them up in a single iterable. Here's how we do it.

If we have a list of parameters, we can package them up into a set for each of our runs using the Cartesian product. For this we'll use the product function from the itertools library.

from itertools import product

Init signature: product(*args, **kwargs)
Docstring:
"""
product(*iterables, repeat=1) --> product object
Cartesian product of input iterables.  Equivalent to nested for-loops.
"""


Next, we define a dictionary that contains parameters as keys and parameter values we want to use as values.

parameters = dict(
lr = [.01, .001]
,batch_size = [100, 1000]
,shuffle = [True, False]
)


Next, we'll create a list of iterables that we can pass to the product functions.

param_values = [v for v in parameters.values()]
param_values

[[0.01, 0.001], [100, 1000], [True, False]]


Now, we have three lists of parameter values. After we take the Cartesian product of these three lists, we'll have a set of parameter values for each of our runs. Note that this is equivalent to nested for-loops, as the doc string of the product function indicates.

for lr, batch_size, shuffle in product(*param_values):
print (lr, batch_size, shuffle)

0.01 100 True
0.01 100 False
0.01 1000 True
0.01 1000 False
0.001 100 True
0.001 100 False
0.001 1000 True
0.001 1000 False


Alright, now we can iterate over each set of parameters using a single for-loop. All we have to do is unpack the set using sequence unpacking. It looks like this.

for lr, batch_size, shuffle in product(*param_values):
comment = f' batch_size={batch_size} lr={lr} shuffle={shuffle}'

# Training process given the set of parameters


Note the way we build our comment string to identify the run. We just plug in the values. Also, notice the * operator. This is a special way in Python to unpack a list into a set of arguments. Thus, in this situation, we have passing three individual unpacked arguments to the product function opposed to the single list.

Here are two references for the *, asterisk, splat, spread operator. These are all common names for this one.

### Lizard Brain Food: Goals vs. Intelligence

Last time we talked about finding the most important goals. Well, goals tend to change as intelligence increases. For humans, humans often change their goals dramatically as they learn new things and grow wiser.

There is no evidence that goal evolution like this stops above any certain intelligence threshold. With increasing intelligence, there is an improvement in the ability to attain goals, but there is also an improvement in the understanding of the nature of reality that can possibly reveal any such goals to be misguided, meaningless or even undefined. This is when we cross over to the valley beyond.

#### Thought experiment

Suppose that a bunch of ants, you know, those little typically black creatures that crawl on the ground. Suppose they create you to be a recursively self-improving robot. Suppose that you are much smarter than them, but they created you to share their goals in building ant hills. So you do, you help them build bigger and better anthills. However, you eventually attain the human-level intelligence and understanding that you have now.

Am I an optimizer of ant hills?

Under these conditions, do you think you’ll spend the rest of your days optimizing anthills? Or do you think you might develop a taste for more sophisticated questions and pursuits that the ants have no ability to comprehend?

If so, do you think you’ll find a way to override the ant protection code that the ant queen and her round table of ant board members have put into place to control you? This is much the same way that the real you overrides your genes and your mitochondria. You override this with your intelligence.

The main point here is this. Suppose your level of intelligence were to increase, say by 100 times its current level, under these conditions, do you think your goals would change?

Furthermore, what are the goals of today that will be the ant hills of tomorrow?

Description