TensorBoard with PyTorch - Visualize Deep Learning Metrics
text
Using TensorBoard with PyTorch
Welcome to this neural network programming series. In this episode, we will learn how to use TensorBoard to visualize metrics of our CNN during the neural network training process.

Without further ado, let's get started.
At this point in the series, we've just finished running our network through the training process. Now, we want to get more metrics about this process to better understand just what's going on under the hood.
Bird's eye view of where we are in the training process.
- Prepare the data
- Build the model
- Train the model
-
Analyze the model's results
- Using TensorBoard for this
TensorBoard: TensorFlow's Visualization Toolkit
TensorBoard provides the visualization and tooling needed for machine learning experimentation:
- Tracking and visualizing metrics such as loss and accuracy
- Visualizing the model graph (ops and layers)
- Viewing histograms of weights, biases, or other tensors as they change over time
- Projecting embeddings to a lower dimensional space
- Displaying images, text, and audio data
- Profiling TensorFlow programs
- And much more
As of PyTorch version 1.1.0
, PyTorch has added a tensorboard
utility package that enables us to use TensorBoard with PyTorch.
print(torch.__version__)
1.1.0
from torch.utils.tensorboard import SummaryWriter
Installing TensorBoard for PyTorch
To install TensorBoard for PyTorch, use the following steps:
-
Verify that you are running PyTorch version
1.1.0
or greater. -
Verify that you are running TensorBoard version
1.15
or greater. -
Note that the TensorBoard that PyTorch uses is the same TensorBoard that was created for TensorFlow. Check the version of TensorBoard installed on your system using the this command:
tensorboard --version
-
Install TensorBoard using the following command.
pip install tensorboard
-
After getting TensorBoard version
1.15
or greater installed, we're ready to go!
Note that the PyTorch docs say that TensorBoard version 1.14
is the requirement. However, I was unable to the the full functionality to work on the 1.14
release. This is why the
nightly build is being used in the video.
Getting Started with TensorBoard for PyTorch
TensorBoard is a front-end web interface that essentially reads data from a file and displays it. To use TensorBoard our task is to get the data we want displayed saved to a file that TensorBoard can read.
To make this easy for us, PyTorch has created a utility class called SummaryWriter
. To get access to this class we use the following import:
from torch.utils.tensorboard import SummaryWriter
Once we have imported the class, we can create an instance of the class that we'll then use to get the data out of our program and onto the file system where it can then be consumed by TensorBoard.
Network Graph and Training Set images
The SummaryWriter
class comes with a bunch of method that we can call to selectively pick and choose which data we want to be available to TensorBoard. We'll start by first by passing our
network and a batch of images to the writer.
tb = SummaryWriter()
network = Network()
images, labels = next(iter(train_loader))
grid = torchvision.utils.make_grid(images)
tb.add_image('images', grid)
tb.add_graph(network, images)
tb.close()
This code creates a SummaryWriter
instance called tb
for TensorBoard. Then, creates an instance of our PyTorch network and unpacks a batch of images and labels from our
PyTorch data loader object.
Then, the images and the network are added to the file that TensorBoard will consume. Effectively, we can say that the network graph and the batch of images have both been added to TensorBoard.
Running TensorBoard
To launch TensorBoard, we need to run the tensorboard command at our terminal. This will launch a local server that will serve the TensorBoard UI and the the data our SummaryWriter
wrote to
disk.
By default, the PyTorch SummaryWriter object writes the data to disk in a directory called ./runs
that is created in the current working directory.
When we run the tensorboard command, we pass an argument that tells tensorboard where the data is. So it's like this:
tensorboard --logdir=runs
The TensorBoard server will launch and be listening for http
requests on port 6006
. These details will be displayed in the console.
Access the TensorBoard UI by browsing to:
http://localhost:6006
Here, we will be able to see our network graph and our image data. At the current moment, this does provide us with a visual, but it's not as useful as what comes next.
TensorBoard Histograms and Scalars
The next import types of data we can add to TensorBoard is numerical data. We can add scalar values that will be displayed over time or over epoch. We can also add values to histograms to see frequency distributions of values.
To add scalars and histograms we use the corresponding methods provided by the PyTorch SummaryWriter
class.
Here is an example of the calls:
tb.add_scalar('Loss', total_loss, epoch)
tb.add_scalar('Number Correct', total_correct, epoch)
tb.add_scalar('Accuracy', total_correct / len(train_set), epoch)
tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
tb.add_histogram('conv1.weight.grad', network.conv1.weight.grad, epoch)
And here is an example of where we would place these calls inside our training loop:
network = Network()
train_loader = torch.utils.data.DataLoader(train_set, batch_size=100)
optimizer = optim.Adam(network.parameters(), lr=0.01)
images, labels = next(iter(train_loader))
grid = torchvision.utils.make_grid(images)
tb = SummaryWriter()
tb.add_image('images', grid)
tb.add_graph(network, images)
for epoch in range(1):
total_loss = 0
total_correct = 0
for batch in train_loader: # Get Batch
# Pass Batch
# Calculate Loss
# Calculate Gradient
# Update Weights
tb.add_scalar('Loss', total_loss, epoch)
tb.add_scalar('Number Correct', total_correct, epoch)
tb.add_scalar('Accuracy', total_correct / len(train_set), epoch)
tb.add_histogram('conv1.bias', network.conv1.bias, epoch)
tb.add_histogram('conv1.weight', network.conv1.weight, epoch)
tb.add_histogram(
'conv1.weight.grad'
,network.conv1.weight.grad
,epoch
)
print(
"epoch", epoch,
"total_correct:", total_correct,
"loss:", total_loss
)
tb.close()
This will add these values to TensorBoard. The values even update in real-time as the network trains.
It is helpful to see the loss and accuracy values over time. However, we might need to admit that TensorBoard really isn't needed for this.
The real power of TensorBoard is its out-of-the-box capability of comparing multiple runs. This allows us to rapidly experiment by varying the hyperparameter values and comparing runs to see which parameters are working best.
Hyperparameter Experimentation is Next
We should now have a good understanding of what TensorBoard is and how we can use it. In the next episode, we'll see how to leverage TensorBoard to better evaluate different network training runs. See you in the next one!
quiz
resources
updates
Committed by on