### PyTorch Callable Neural Networks - Deep Learning in Python

Welcome to this series on neural network programming with PyTorch. In this one, we'll learn about how PyTorch neural network modules are callable, what this means, and how it informs us about how our network and layer forward methods are called.

Without further ado, let's get started.

### How Linear Layers Work

In the last post of this series, we learned about how linear layers use matrix multiplication to transform their in features to out features.

When the input features are received by a linear layer, they are received in the form of a flattened 1-dimensional tensor and are then multiplied by the weight matrix. This matrix multiplication produces the output features.

Let's see an example of this in code.

#### Transform Using a Matrix

in_features = torch.tensor([1,2,3,4], dtype=torch.float32) weight_matrix = torch.tensor([ [1,2,3,4], [2,3,4,5], [3,4,5,6] ], dtype=torch.float32) > weight_matrix.matmul(in_features) tensor([30., 40., 50.])

Here, we have created a 1-dimensional tensor called `in_features`

. We have also created a weight matrix which of course is a 2-dimensional tensor. Then, we've use the
`matmul()`

function to preform the matrix multiplication operation that produces a 1-dimensional tensor.

In general, the weight matrix defines a linear function that maps a 1-dimensional tensor with four elements to a 1-dimensional tensor that has three elements. We can think of this function as a mapping from 4-dimensional Euclidean space to 3-dimensional Euclidean space.

This is how linear layers work as well. They map an `in_feature`

space to an `out_feature`

space using a weight matrix.

#### Transform Using a PyTorch Linear Layer

Let's see how to create a PyTorch linear layer that will do this same operation.

fc = nn.Linear(in_features=4, out_features=3, bias=False)

Here, we have it. We've defined a linear layer that accepts `4`

in features and transforms these into `3`

out features, so we go from 4-dimensional space to 3-dimensional space.
We know that a weight matrix is used to preform this operation, but where is the weight matrix in this example?

We'll the weight matrix is lives inside the PyTorch `LinearLayer`

class and is created by PyTorch. The PyTorch `LinearLayer`

class uses the numbers `4`

and
`3`

that are passed to the constructor to create a ```
3 x
4
```

weight matrix. Let's verify this by taking a look at the PyTorch source code.

# torch/nn/modules/linear.py (version 1.0.1) def __init__(self, in_features, out_features, bias=True): super(Linear, self).__init__() self.in_features = in_features self.out_features = out_features self.weight = Parameter(torch.Tensor(out_features, in_features)) if bias: self.bias = Parameter(torch.Tensor(out_features)) else: self.register_parameter('bias', None) self.reset_parameters()

As we have seen, when we multiply a `3 x 4`

matrix with a `4 x 1`

matrix, the result is a `3 x 1`

matrix. This is why PyTorch builds the weight matrix in this way. These
are linear algebra rules for matrix multiplication.

Let's see how we can call our layer now by passing the `in_features`

tensor.

> fc(in_features) tensor([-0.8877, 1.4250, 0.8370], grad_fn=<SqueezeBackward3>)

We can call the object instance like this because PyTorch neural network modules are callable Python objects. We'll look at this important detail more closely in a minute, but first, check out this output. We did indeed get a 1-dimensional tensor with three elements. However, different values were produced.

This is because PyTorch creates a weight matrix and initializes it with random values. This means that the linear functions from the two examples are different, so we are using different function to produce these outputs.

Remember the values inside the weight matrix define the linear function. This demonstrates how the network's mapping changes as the weights are updated during the training process.

Let's explicitly set the weight matrix of the linear layer to be the same as the one we used in our other example.

fc.weight = nn.Parameter(weight_matrix)

PyTorch module weights need to be parameters. This is why we wrap the weight matrix tensor inside a parameter class instance. Let's see now how this layer transforms the input using the new weight matrix. We hope to see the same results as in our previous example.

> fc(in_features) tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

This time we are much closer to the `30`

, `40`

, and `50`

values. However, we're exact. Why is this? We'll, this is not exact because the linear layer is adding
a
bias tensor to the output. Watch what happens when we turn the bias off. We do this by passing a False flag to the constructor.

fc = nn.Linear(in_features=4, out_features=3, bias=False) fc.weight = nn.Parameter(weight_matrix) > fc(in_features) tensor([30., 40., 50.], grad_fn=<SqueezeBackward3>)

There, now we have an exact match. This is how linear layers work.

#### Mathematical Notation of the Linear Transformation

Sometimes we'll see linear layer operation referred to as \[y=Ax + b.\] In this equation, we have the following:

Variable | Definition |
---|---|

\(A\) | Weight matrix tensor |

\(x\) | Input tensor |

\(b\) | Bias tensor |

\(y\) | Output tensor |

We'll note that this is similar to the equation for a line \[y=mx+b.\]

### Callable Layers and Neural Networks

We pointed out before how it was kind of strange that we called the layer object instance as if it were a function.

> fc(in_features) tensor([30.0261, 40.1404, 49.7643], grad_fn=<AddBackward0>)

What makes this possible is that PyTorch module classes implement another special Python function called `__call__()`

. If a class implements the `__call__()`

method, the special
call method will be invoked anytime the object instance is called.

This fact is an important PyTorch concept because of the way the `__call__()`

method interacts with the `forward()`

method for our layers and networks.

Instead of calling the `forward()`

method directly, we call the object instance. After the object instance is called, the `__call__()`

method is invoked under the hood, and the
`__call__()`

in turn invokes the `forward()`

method. This applies to all PyTorch neural network modules, namely, networks and layers.

Let's see this in the PyTorch source code.

# torch/nn/modules/module.py (version 1.0.1) def __call__(self, *input, **kwargs): for hook in self._forward_pre_hooks.values(): hook(self, input) if torch._C._get_tracing_state(): result = self._slow_forward(*input, **kwargs) else: result = self.forward(*input, **kwargs) for hook in self._forward_hooks.values(): hook_result = hook(self, input, result) if hook_result is not None: raise RuntimeError( "forward hooks should never return any values, but '{}'" "didn't return None".format(hook)) if len(self._backward_hooks) > 0: var = result while not isinstance(var, torch.Tensor): if isinstance(var, dict): var = next((v for v in var.values() if isinstance(v, torch.Tensor))) else: var = var[0] grad_fn = var.grad_fn if grad_fn is not None: for hook in self._backward_hooks.values(): wrapper = functools.partial(hook, self) functools.update_wrapper(wrapper, hook) grad_fn.register_hook(wrapper) return result

The extra code that PyTorch runs inside the `__call__()`

method is why we never invoke the `forward()`

method directly. If we did, the additional PyTorch code would not be executed.
As a result, any time we want to invoke our `forward()`

method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.

We are now ready to implement our network's `forward()`

method. I'll see you in the next one!