Neural Network Programming - Deep Learning with PyTorch

with deeplizard.

Callable Neural Networks - Linear Layers in Depth

April 15, 2019 by

Blog

PyTorch Callable Neural Networks - Deep Learning in Python

Welcome to this series on neural network programming with PyTorch. In this one, we'll learn about how PyTorch neural network modules are callable, what this means, and how it informs us about how our network and layer forward methods are called.

Without further ado, let's get started.

How Linear Layers Work

In the last post of this series, we learned about how linear layers use matrix multiplication to transform their in features to out features.

When the input features are received by a linear layer, they are passed in the form of a flattened 1-dimensional tensor and are then multiplied by the weight matrix. This matrix multiplication produces the output features.

Let's see an example of this in code.

Transform Using a Matrix

in_features = torch.tensor([1,2,3,4], dtype=torch.float32)

weight_matrix = torch.tensor([
    [1,2,3,4],
    [2,3,4,5],
    [3,4,5,6]
], dtype=torch.float32)

> weight_matrix.matmul(in_features)
tensor([30., 40., 50.])

Here, we have created a 1-dimensional tensor called in_features. We have also created a weight matrix which of course is a 2-dimensional tensor. Then, we've use the matmul() function to preform the matrix multiplication operation that produces a 1-dimensional tensor.

In general, the weight matrix defines a linear function that maps a 1-dimensional tensor with four elements to a 1-dimensional tensor that has three elements. We can think of this function as a mapping from 4-dimensional Euclidean space to 3-dimensional Euclidean space.

This is how linear layers work as well. They map an in_feature space to an out_feature space using a weight matrix.

Transform Using a PyTorch Linear Layer

Let's see how to create a PyTorch linear layer that will do this same operation.

fc = nn.Linear(in_features=4, out_features=3, bias=False)

Here, we have it. We've defined a linear layer that accepts 4 in features and transforms these into 3 out features, so we go from 4-dimensional space to 3-dimensional space. We know that a weight matrix is used to preform this operation, but where is the weight matrix in this example?

We'll the weight matrix is lives inside the PyTorch LinearLayer class and is created by PyTorch. The PyTorch LinearLayer class uses the numbers 4 and 3 that are passed to the constructor to create a 3 x 4 weight matrix. Let's verify this by taking a look at the PyTorch source code.

# torch/nn/modules/linear.py (version 1.0.1)

def __init__(self, in_features, out_features, bias=True):
    super(Linear, self).__init__()
    self.in_features = in_features
    self.out_features = out_features
    self.weight = Parameter(torch.Tensor(out_features, in_features))
    if bias:
        self.bias = Parameter(torch.Tensor(out_features))
    else:
        self.register_parameter('bias', None)
    self.reset_parameters()

As we have seen, when we multiply a 3 x 4 matrix with a 4 x 1 matrix, the result is a 3 x 1 matrix. This is why PyTorch builds the weight matrix in this way. These are linear algebra rules for matrix multiplication.

Let's see how we can call our layer now by passing the in_features tensor.

> fc(in_features)
tensor([-0.8877,  1.4250,  0.8370], grad_fn=<SqueezeBackward3>)

We can call the object instance like this because PyTorch neural network modules are callable Python objects. We'll look at this important detail more closely in a minute, but first, check out this output. We did indeed get a 1-dimensional tensor with three elements. However, different values were produced.

This is because PyTorch creates a weight matrix and initializes it with random values. This means that the linear functions from the two examples are different, so we are using different function to produce these outputs.

Remember the values inside the weight matrix define the linear function. This demonstrates how the network's mapping changes as the weights are updated during the training process.

Let's explicitly set the weight matrix of the linear layer to be the same as the one we used in our other example.

fc.weight = nn.Parameter(weight_matrix)  

PyTorch module weights need to be parameters. This is why we wrap the weight matrix tensor inside a parameter class instance. Let's see now how this layer transforms the input using the new weight matrix. We hope to see the same results as in our previous example.

> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

This time we are much closer to the 30, 40, and 50 values. However, we're exact. Why is this? We'll, this is not exact because the linear layer is adding a bias tensor to the output. Watch what happens when we turn the bias off. We do this by passing a False flag to the constructor.

fc = nn.Linear(in_features=4, out_features=3, bias=False)
fc.weight = nn.Parameter(weight_matrix)
> fc(in_features)
tensor([30., 40., 50.], grad_fn=<SqueezeBackward3>)

There, now we have an exact match. This is how linear layers work.

Mathematical Notation of the Linear Transformation

Sometimes we'll see linear layer operation referred to as \[y=Ax + b.\] In this equation, we have the following:

Variable Definition
\(A\) Weight matrix tensor
\(x\) Input tensor
\(b\) Bias tensor
\(y\) Output tensor

We'll note that this is similar to the equation for a line \[y=mx+b.\]

Callable Layers and Neural Networks

We pointed out before how it was kind of strange that we called the layer object instance as if it were a function.

> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

What makes this possible is that PyTorch module classes implement another special Python function called __call__(). If a class implements the __call__() method, the special call method will be invoked anytime the object instance is called.

This fact is an important PyTorch concept because of the way the __call__() method interacts with the forward() method for our layers and networks.

Instead of calling the forward() method directly, we call the object instance. After the object instance is called, the __call__() method is invoked under the hood, and the __call__() in turn invokes the forward() method. This applies to all PyTorch neural network modules, namely, networks and layers.

Let's see this in the PyTorch source code.

# torch/nn/modules/module.py (version 1.0.1)

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        hook(self, input)
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
    else:
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            raise RuntimeError(
                "forward hooks should never return any values, but '{}'"
                "didn't return None".format(hook))
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
            else:
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
                wrapper = functools.partial(hook, self)
                functools.update_wrapper(wrapper, hook)
                grad_fn.register_hook(wrapper)
    return result

The extra code that PyTorch runs inside the __call__() method is why we never invoke the forward() method directly. If we did, the additional PyTorch code would not be executed. As a result, any time we want to invoke our forward() method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.

We are now ready to implement our network's forward() method. I'll see you in the next one!

Description

In this post, we'll be examining how and why we call PyTorch networks and layers. We'll also be diving into the inner workings of linear layers, the math and the code! 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👉 Check out the blog post and other resources for this video: 🔗 https://deeplizard.com/learn/video/rcc86nXKwkw 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 https://www.patreon.com/posts/27743395 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 https://deeplizard.com/hivemind 🤜 Support collective intelligence, create a quiz question for this video: 🔗 https://deeplizard.com/create-quiz-question 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Peder B. Helland 👀 Follow deeplizard: Twitter: https://twitter.com/deeplizard Facebook: https://www.facebook.com/Deeplizard-145413762948316 Patreon: https://www.patreon.com/deeplizard YouTube: https://www.youtube.com/deeplizard Instagram: https://www.instagram.com/deeplizard/ 🎓 Other deeplizard courses: Reinforcement Learning - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv NN Programming - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG DL Fundamentals - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- Data Science - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrth-Cqs_R9- Trading - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xr17PqeytCKiCD-TJj89rII 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://www.amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard’s link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ 🔗 http://incompetech.com/ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.