Computational Graphs for Neural Networks Code Demo

video

expand_more

text

expand_more

Computational Graphs for Neural Networks Code Demo

In this episode, we'll examine the computational graphs used to compute gradients via backpropagation during neural network training.

We'll use PyTorch for demo purposes, but the concepts we cover here are fundamental concepts of neural networks. While we'll be gaining an understanding of a fundamental piece of training neural networks, there is an important reason to examine this in the GAN course.

When we trained DCGAN with PyTorch, we saw that during the discriminator training portion, we called detach() on the fake images generated by the generator when we passed them to the discriminator for classification. When we passed this same batch of images to the discriminator a second time during generator training, however, we did not call detach().

There, we explained by doing this detaching, we would be returned a new tensor detached from the current graph. To understand what this means, we need to first understand computational graphs in general.

Computational Graph Code Demo

In our demo, we'll use the torchviz and graphviz visualization libraries to plot a visual representation of neural network computational graphs. We need to first install torchviz and graphviz with the commands below in order to import them in the next step.

conda install graphviz
conda install python-graphviz
pip install torchviz==0.0.2

Now, we'll import PyTorch and the nn and functional modules we'll make use of, as well as torchviz and the Diagraph module from graphviz. We'll also set PyTorch's random seed so that we can generate the same random values each time we run this code.

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(0)
torch.set_printoptions(linewidth=120)

from graphviz import Digraph
import torchviz

Now, we'll create two tensors a and b, which each contain just one float value.

a = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(2.0, requires_grad=True)

By setting requires_grad=True, we are specifying that we want to track operations that occur on the tensor. This tracking will allow us to calculate gradients for the tensor that requires a gradient.

Generally, we set the requires_grad parameter of a given tensor to True if gradients will later be computed with respect to the tensor. When this parameter is set to True, we also say that the tensor has gradient tracking turned on or enabled.

The recorded operations that occur on a tensor is what is referred to as its computational graph, or just graph for short. We can think of a graph as being attached to a tensor.

In a neural network, the learnable parameters like weights and biases have requires_grad set to True since we'll later want to compute the gradient of the network's loss with respect to these parameters.

For a given graph, tensors that are created by the user with requires_grad set to True are referred to as leaf tensors or leaf nodes, which means that they are not the result of another operation in the graph.

Leaf nodes in a graph are nodes where the graph starts or stops. They are sometimes called terminal nodes of a graph.

Consider the weights in a network. They are not the result of any operation in the graph and therefore are leaf tensors. We'll understand why these tensors are referred to as leaves when we plot a visualization of a tensor's graph.

Now we'll create a new tensor c as the sum of a and b.

c = a + b

By printing c, we get the value of the tensor, as well as its gradient function grad_fn.

tensor(3., grad_fn=AddBackward0)

When a tensor is created as a result of an operation performed on one or more tensors that have gradient tracking turned on, it will have a gradient function grad_fn, which specifies how the tensor was created.

In the case of c, its grad_fn is AddBackward. This indicates that c is the result of a sum. The referenced function AddBackward is the function that will be called when backpropagation occurs on this tensor.

Since c has a grad_fn, this means that we can see how c will be affected if we change either of its inputs, a or b. In other words, we can calculate the derivatives or gradients of c with respect to a and b.

This is analogous to the loss of a network being a function of all the network's weights, and then calculating the gradients of the loss with respect to these weights. The weights will have gradient tracking enabled, and therefore, the loss will have a grad_fn that discloses how to compute its gradients.

Now let's create one more tensor.

d = c * a

Here, d is the product of the leaf tensor a with the tensor c, which itself is the sum of two leaf tensors a and b. Let's print d.

tensor(3., grad_fn=MulBackward0)

We can see the value of d is 3 as a result of the multiplication of a and c, and we also see it's grad_fn is MulBackward.

This tells us that d was created as a result of the multiplication operation, and also that somewhere in the history of computations that resulted in d's creation, we have at least one tensor that has gradient tracking enabled.

Now, we know d is a result of the multiplication operation with one of the inputs being c, which itself was the result of the addition operation of the two leaf tensors a and b.

If we want to calculate the gradients of d with respect to a and b, we have to first call backward() on d. The backward() method computes the gradient of the given tensor with respect to any leaf tensors. In our case, this will be with respect to a and b. The resulting gradients will be accessible by calling grad on the leaf tensors.

For example, let's look at the gradient of d with respect to a and b by calling grad on these leaf tensors.

print(a.grad, b.grad)

We get the following output.

None None

This is because backward() has not been called yet on a tensor that was derived from a and b. Without this call, no gradients will have been computed with respect to a and b.

So now let's call backward() on d to compute d's gradients with respect to a and b, and then inspect the corresponding gradients again.

d.backward()
print(a.grad, b.grad)

Now we have the following results.

tensor(4.) tensor(1.)

Now we can see that the gradient of d with respect to a is 4, and the gradient of d with respect to b is 1.

Visualizing a Computational Graph

Now that we're well acquainted with the relationships of these tensors and how gradients are tracked via the use of a computational graph, let's visualize these results using torchviz.

torchviz.make_dot(d, params = {
    "a": a, "b": b, "c": c, "d": d
})

Passing d to torchviz.make_dot() plots a visual of d's graph.

This is a graph representation of the operations we did that resulted in d. This gives us a visual representation of how backward() uses this graph to calculate d's gradients with respect to a and b.

As we know, we can inspect a tensor's grad_fn, and we earlier saw that d's grad_fn is MulBackward since it is the direct result of the multiplication operation. In addition to seeing the operation that lead directly to d's creation, we can also see previous operations by calling next_functions on the grad_fn.

print(d.grad_fn)
print(d.grad_fn.next_functions)

We can now see the previous operations that occurred before the multiplication that lead to d's creation.

MulBackward0 object at 0x000001F229DFBC50
(
    (AddBackward0 object at 0x000001F229DFBA20, 0), 
    (AccumulateGrad object at 0x000001F229DFBCF8, 0)
)

This shows us that the multiplication operation took two inputs. One input was the result of the addition operation, and the other input was a leaf tensor.

Going a step further, we can go all the way back to the creation of the leaf tensors a and b.

print(d.grad_fn.next_functions[0][0])
print(d.grad_fn.next_functions[0][0].next_functions)

This gives us the following result.

AddBackward0 object at 0x000001F229DFBA20
(
    (AccumulateGrad object at 0x000001F229DFBCF8, 0), 
    (AccumulateGrad object at 0x000001F229DFBEB8, 0)
)

This shows us that tensor c, which was created as a result of the addition operation, had two leaf inputs.

This process is representative of "walking through" the graph of d. This is further illustrated with the function below that prints the next_functions in an organized manner that is directly in relation with the graph we plotted with torchviz.

def walk_graph(g, step=0):
    if g is not None:
        print(step, step * '   ', type(g))
        for f in g.next_functions:
            walk_graph(f[0], step + 1)

walk_graph(d.grad_fn, 0)

Calling this function as shown above gives us the following output.

0  class 'MulBackward0'
1     class 'AddBackward0'
2        class 'AccumulateGrad'
2        class 'AccumulateGrad'
1     class 'AccumulateGrad'

Detaching a Tensor's Graph

Now we're going to discuss what happens when we detach a graph from a tensor. This is directly applicable to what we saw when we implemented DCGAN training with PyTorch. We'll tie this specific example back in towards the end.

Let's create three leaf tensors.

a = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(2.0, requires_grad=True)
c = torch.tensor(3.0, requires_grad=True)

Now let's create another tensor d, which is the sum of a and b.

d = a + b

Now we'll create another tensor e, which is the result of c summed with the relu operation on d. Note that we've called detach() on d to create e.

e = F.relu(d.detach()) + c

The last tensor f is the same as tensor e, except we're not detaching d's graph when we create f.

f = F.relu(d) + c

Let's now look at e's graph using torchviz.

torchviz.make_dot(e, params = {
    "a": a, "b": b, "c": c, "d": d, "e": e, "f": f
})

We can see that e is the result of the addition that occurred between the leaf tensor c and some other tensor that is not having its operations recorded. As we know, this other tensor is the result of relu(d.detach()).

Let's now look at f's graph.

torchviz.make_dot(f, params = {
    "a": a, "b": b, "c": c, "d": d, "e": e, "f": f
})

Recall that e and f were created in the exact same manner, except for that when creating e, we detached d's graph.

Tensor f's graph looks more comprehensive because now we have d's computational history included as well.

From the graph, we can see that f is the result of the addition that occurred between the leaf tensor c and relu(d).

We can then see the additional information that d is the result of the addition that occurred between leaf tensors a and b. Since d's history was detached when we created e, we did not see this additional piece of history in e's graph.

Now let's circle back to the use of detach() during GAN training.

The use of `detach()` during GAN Training

Below is a code snippet from the PyTorch training loop for training the discriminator of the DCGAN we implemented earlier. We're specifically focusing on the step of training the discriminator with an all-fake batch.

# TRAIN DISCRIMINATOR

# Train with all-real batch
...

# Train with all-fake batch
noise = torch.randn(real_images.size(0), z_size, 1, 1).to(device)
fake_images = netG(noise)
fake_output = netD(fake_images.detach())

d_loss = discriminator_loss(real_output, fake_output)
d_loss.backward()
d_optimizer.step()

We first generate a batch of fake images by passing a vector of random noise tensors to the generator netG and store the results in fake_images.

We then pass fake_images to the discriminator netD to get predictions for this fake batch.

When doing this step, we use a copy of fake_images that has been detached from its graph. The result of these predictions, fake_output, is then used to calculate the discriminator's loss d_loss, which we ultimately call backward() on.

The network netG is made up of many leaf tensors that lead to the creation of fake_images, and so fake_images has a history of operations already attached its graph from its creation. We'll need access to this history whenever we train netG.

Note that calling backward() clears the graph used to compute the gradients. Therefore, if we didn't detach fake_images from its graph, then its graph would be cleared in the upcoming backward() pass for netD and would not be accessible when training netG afterwards.

Computational graphs can become large, and this is why PyTorch clears graphs after using them in a backward() pass.

Recall that after training the discriminator, we use this same tensor when training netG. There, we pass this tensor to netD again during netG's training, and it is at this point that we'll need access to fake_images graph to compute netG's gradients.

# TRAIN GENERATOR
netG.zero_grad()
fake_output = netD(fake_images)
g_loss = generator_loss(fake_output)
g_loss.backward()
g_optimizer.step()

This is precisely why we take the crucial step of detaching the graph from the fake_images when training the discriminator. If we didn't, we wouldn't have access to fake_images graph at this point, as it would have been cleared when we called backward() during the discriminator training.

quiz

expand_more

resources

expand_more

In this episode, we'll examine the computational graphs used to compute gradients via backpropagation during neural network training. While we'll be gaining an understanding of a fundamental piece of training neural networks, we'll see why there is an important reason to examine this in the GAN course. We'll use PyTorch for demo purposes, but the concepts we cover here are fundamental concepts of neural networks and generalize beyond PyTorch. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:20 Intro to computational graphs 01:31 Code demo - computational graphs 11:27 Code demo - detaching a tensor from its graph 15:22 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💪 CHECK OUT OUR FITNESS CHANNEL: 🔗 https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: 🔗 https://neurohacker.com/shop?rfsn=6488344.d171c6 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd 🎓 Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

Generative Adversarial Networks - GANs Intro

Computational Graphs for Neural Networks Code Demo

video

text

Computational Graphs for Neural Networks Code Demo

Computational Graph Code Demo

Visualizing a Computational Graph

Detaching a Tensor's Graph

The use of `detach()` during GAN Training

quiz

Quiz Results

resources

updates

Update history for this page

Generative Adversarial Networks - GANs Intro

Computational Graphs for Neural Networks Code Demo

Computational Graphs for Neural Networks Code Demo

Computational Graph Code Demo

Visualizing a Computational Graph

Detaching a Tensor's Graph

The use of detach() during GAN Training

Quiz Results

Update history for this page

The use of `detach()` during GAN Training