Neural Network Programming - Deep Learning with PyTorch

with deeplizard.

PyTorch Callable Neural Networks - Linear Layers in Depth

April 15, 2019 by


PyTorch Callable Neural Networks - Deep Learning in Python

Welcome to this series on neural network programming with PyTorch. In this one, we'll learn about how PyTorch neural network modules are callable, what this means, and how it informs us about how our network and layer forward methods are called.

Without further ado, let's get started.

How Linear Layers Work

In the last post of this series, we learned about how linear layers use matrix multiplication to transform their in features to out features.

When the input features are received by a linear layer, they are passed in the form of a flattened 1-dimensional tensor and are then multiplied by the weight matrix. This matrix multiplication produces the output features.

Let's see an example of this in code.

Transform Using a Matrix

in_features = torch.tensor([1,2,3,4], dtype=torch.float32)

weight_matrix = torch.tensor([
], dtype=torch.float32)

> weight_matrix.matmul(in_features)
tensor([30., 40., 50.])

Here, we have created a 1-dimensional tensor called in_features. We have also created a weight matrix which of course is a 2-dimensional tensor. Then, we've use the matmul() function to preform the matrix multiplication operation that produces a 1-dimensional tensor.

In general, the weight matrix defines a linear function that maps a 1-dimensional tensor with four elements to a 1-dimensional tensor that has three elements. We can think of this function as a mapping from 4-dimensional Euclidean space to 3-dimensional Euclidean space.

This is how linear layers work as well. They map an in_feature space to an out_feature space using a weight matrix.

Transform Using a PyTorch Linear Layer

Let's see how to create a PyTorch linear layer that will do this same operation.

fc = nn.Linear(in_features=4, out_features=3, bias=False)

Here, we have it. We've defined a linear layer that accepts 4 in features and transforms these into 3 out features, so we go from 4-dimensional space to 3-dimensional space. We know that a weight matrix is used to preform this operation, but where is the weight matrix in this example?

We'll the weight matrix is lives inside the PyTorch LinearLayer class and is created by PyTorch. The PyTorch LinearLayer class uses the numbers 4 and 3 that are passed to the constructor to create a 3 x 4 weight matrix. Let's verify this by taking a look at the PyTorch source code.

# torch/nn/modules/ (version 1.0.1)

def __init__(self, in_features, out_features, bias=True):
    super(Linear, self).__init__()
    self.in_features = in_features
    self.out_features = out_features
    self.weight = Parameter(torch.Tensor(out_features, in_features))
    if bias:
        self.bias = Parameter(torch.Tensor(out_features))
        self.register_parameter('bias', None)

As we have seen, when we multiply a 3 x 4 matrix with a 4 x 1 matrix, the result is a 3 x 1 matrix. This is why PyTorch builds the weight matrix in this way. These are linear algebra rules for matrix multiplication.

Let's see how we can call our layer now by passing the in_features tensor.

> fc(in_features)
tensor([-0.8877,  1.4250,  0.8370], grad_fn=<SqueezeBackward3>)

We can call the object instance like this because PyTorch neural network modules are callable Python objects. We'll look at this important detail more closely in a minute, but first, check out this output. We did indeed get a 1-dimensional tensor with three elements. However, different values were produced.

This is because PyTorch creates a weight matrix and initializes it with random values. This means that the linear functions from the two examples are different, so we are using different function to produce these outputs.

Remember the values inside the weight matrix define the linear function. This demonstrates how the network's mapping changes as the weights are updated during the training process.

Let's explicitly set the weight matrix of the linear layer to be the same as the one we used in our other example.

fc.weight = nn.Parameter(weight_matrix)  

PyTorch module weights need to be parameters. This is why we wrap the weight matrix tensor inside a parameter class instance. Let's see now how this layer transforms the input using the new weight matrix. We hope to see the same results as in our previous example.

> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

This time we are much closer to the 30, 40, and 50 values. However, we're exact. Why is this? We'll, this is not exact because the linear layer is adding a bias tensor to the output. Watch what happens when we turn the bias off. We do this by passing a False flag to the constructor.

fc = nn.Linear(in_features=4, out_features=3, bias=False)
fc.weight = nn.Parameter(weight_matrix)
> fc(in_features)
tensor([30., 40., 50.], grad_fn=<SqueezeBackward3>)

There, now we have an exact match. This is how linear layers work.

Mathematical Notation of the Linear Transformation

Sometimes we'll see linear layer operation referred to as \[y=Ax + b.\] In this equation, we have the following:

Variable Definition
\(A\) Weight matrix tensor
\(x\) Input tensor
\(b\) Bias tensor
\(y\) Output tensor

We'll note that this is similar to the equation for a line \[y=mx+b.\]

Callable Layers and Neural Networks

We pointed out before how it was kind of strange that we called the layer object instance as if it were a function.

> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

What makes this possible is that PyTorch module classes implement another special Python function called __call__(). If a class implements the __call__() method, the special call method will be invoked anytime the object instance is called.

This fact is an important PyTorch concept because of the way the __call__() method interacts with the forward() method for our layers and networks.

Instead of calling the forward() method directly, we call the object instance. After the object instance is called, the __call__() method is invoked under the hood, and the __call__() in turn invokes the forward() method. This applies to all PyTorch neural network modules, namely, networks and layers.

Let's see this in the PyTorch source code.

# torch/nn/modules/ (version 1.0.1)

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        hook(self, input)
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            raise RuntimeError(
                "forward hooks should never return any values, but '{}'"
                "didn't return None".format(hook))
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
                wrapper = functools.partial(hook, self)
                functools.update_wrapper(wrapper, hook)
    return result

The extra code that PyTorch runs inside the __call__() method is why we never invoke the forward() method directly. If we did, the additional PyTorch code would not be executed. As a result, any time we want to invoke our forward() method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.

We are now ready to implement our network's forward() method. I'll see you in the next one!


In this post, we'll be examining how and why we call PyTorch networks and layers. We'll also be diving into the inner workings of linear layers, the math and the code! Check out the corresponding blog and other resources for this video at: ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Peder B. Helland Support collective intelligence, and join the deeplizard hivemind: Code: Code files are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: Follow deeplizard: YouTube: Twitter: Facebook: Steemit: Instagram: Pinterest: Check out products deeplizard suggests on Amazon: Get a free Audible 30-day trial and 2 free audio books with deeplizard’s link: New intro template designed by: Recommended books on AI: The Most Human Human: What Artificial Intelligence Teaches Us About Being Alive: Life 3.0: Being Human in the Age of Artificial Intelligence Playlists: Data Science - Machine Learning - Keras - TensorFlow.js - PyTorch - Reinforcement Learning - Music: Thinking Music by Kevin MacLeod Jarvic 8 by Kevin MacLeod YouTube: Website: Licensed under Creative Commons: By Attribution 3.0 License #ai #pytorch #deeplearning