Callable Neural Networks - Linear Layers in Depth
text
PyTorch Callable Neural Networks - Deep Learning in Python
Welcome to this series on neural network programming with PyTorch. In this one, we'll learn about how PyTorch neural network modules are callable, what this means, and how it informs us about how our network and layer forward methods are called.
Without further ado, let's get started.
How Linear Layers Work
In the last post of this series, we learned about how linear layers use matrix multiplication to transform their in features to out features.
When the input features are received by a linear layer, they are received in the form of a flattened 1-dimensional tensor and are then multiplied by the weight matrix. This matrix multiplication produces the output features.
Let's see an example of this in code.
Transform Using a Matrix
in_features = torch.tensor([1,2,3,4], dtype=torch.float32)
weight_matrix = torch.tensor([
[1,2,3,4],
[2,3,4,5],
[3,4,5,6]
], dtype=torch.float32)
> weight_matrix.matmul(in_features)
tensor([30., 40., 50.])
Here, we have created a 1-dimensional tensor called in_features
. We have also created a weight matrix which of course is a 2-dimensional tensor. Then, we've use the
matmul()
function to preform the matrix multiplication operation that produces a 1-dimensional tensor.
In general, the weight matrix defines a linear function that maps a 1-dimensional tensor with four elements to a 1-dimensional tensor that has three elements. We can think of this function as a mapping from 4-dimensional Euclidean space to 3-dimensional Euclidean space.
This is how linear layers work as well. They map an in_feature
space to an out_feature
space using a weight matrix.
Transform Using a PyTorch Linear Layer
Let's see how to create a PyTorch linear layer that will do this same operation.
fc = nn.Linear(in_features=4, out_features=3, bias=False)
Here, we have it. We've defined a linear layer that accepts 4
in features and transforms these into 3
out features, so we go from 4-dimensional space to 3-dimensional space.
We know that a weight matrix is used to preform this operation, but where is the weight matrix in this example?
We'll the weight matrix is lives inside the PyTorch LinearLayer
class and is created by PyTorch. The PyTorch LinearLayer
class uses the numbers 4
and
3
that are passed to the constructor to create a 3 x
4
weight matrix. Let's verify this by taking a look at the PyTorch source code.
# torch/nn/modules/linear.py (version 1.0.1)
def __init__(self, in_features, out_features, bias=True):
super(Linear, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
As we have seen, when we multiply a 3 x 4
matrix with a 4 x 1
matrix, the result is a 3 x 1
matrix. This is why PyTorch builds the weight matrix in this way. These
are linear algebra rules for matrix multiplication.
Let's see how we can call our layer now by passing the in_features
tensor.
> fc(in_features)
tensor([-0.8877, 1.4250, 0.8370], grad_fn=)
We can call the object instance like this because PyTorch neural network modules are callable Python objects. We'll look at this important detail more closely in a minute, but first, check out this output. We did indeed get a 1-dimensional tensor with three elements. However, different values were produced.
This is because PyTorch creates a weight matrix and initializes it with random values. This means that the linear functions from the two examples are different, so we are using different function to produce these outputs.

Remember the values inside the weight matrix define the linear function. This demonstrates how the network's mapping changes as the weights are updated during the training process.
Let's explicitly set the weight matrix of the linear layer to be the same as the one we used in our other example.
fc.weight = nn.Parameter(weight_matrix)
PyTorch module weights need to be parameters. This is why we wrap the weight matrix tensor inside a parameter class instance. Let's see now how this layer transforms the input using the new weight matrix. We hope to see the same results as in our previous example.
> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=)
This time we are much closer to the 30
, 40
, and 50
values. However, we're exact. Why is this? We'll, this is not exact because the linear layer is adding
a
bias tensor to the output. Watch what happens when we turn the bias off. We do this by passing a False flag to the constructor.
fc = nn.Linear(in_features=4, out_features=3, bias=False)
fc.weight = nn.Parameter(weight_matrix)
> fc(in_features)
tensor([30., 40., 50.], grad_fn=)
There, now we have an exact match. This is how linear layers work.
Mathematical Notation of the Linear Transformation
Sometimes we'll see linear layer operation referred to as \[y=Ax + b.\] In this equation, we have the following:
Variable | Definition |
---|---|
\(A\) | Weight matrix tensor |
\(x\) | Input tensor |
\(b\) | Bias tensor |
\(y\) | Output tensor |
We'll note that this is similar to the equation for a line \[y=mx+b.\]
Callable Layers and Neural Networks
We pointed out before how it was kind of strange that we called the layer object instance as if it were a function.
> fc(in_features)
tensor([30.0261, 40.1404, 49.7643], grad_fn=)
What makes this possible is that PyTorch module classes implement another special Python function called __call__()
. If a class implements the __call__()
method, the special call
method will be invoked anytime the object instance is called.
This fact is an important PyTorch concept because of the way the __call__()
method interacts with the forward()
method for our layers and networks.
Instead of calling the forward()
method directly, we call the object instance. After the object instance is called, the __call__()
method is invoked under the hood, and the
__call__()
in turn invokes the forward()
method. This applies to all PyTorch neural network modules, namely, networks and layers.
Let's see this in the PyTorch source code.
# torch/nn/modules/module.py (version 1.0.1)
def __call__(self, *input, **kwargs):
for hook in self._forward_pre_hooks.values():
hook(self, input)
if torch._C._get_tracing_state():
result = self._slow_forward(*input, **kwargs)
else:
result = self.forward(*input, **kwargs)
for hook in self._forward_hooks.values():
hook_result = hook(self, input, result)
if hook_result is not None:
raise RuntimeError(
"forward hooks should never return any values, but '{}'"
"didn't return None".format(hook))
if len(self._backward_hooks) > 0:
var = result
while not isinstance(var, torch.Tensor):
if isinstance(var, dict):
var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
else:
var = var[0]
grad_fn = var.grad_fn
if grad_fn is not None:
for hook in self._backward_hooks.values():
wrapper = functools.partial(hook, self)
functools.update_wrapper(wrapper, hook)
grad_fn.register_hook(wrapper)
return result
The extra code that PyTorch runs inside the __call__()
method is why we never invoke the forward()
method directly. If we did, the additional PyTorch code would not be executed.
As a result, any time we want to invoke our forward()
method, we call the object instance. This applies to both layers, and networks because they are both PyTorch neural network modules.
We are now ready to implement our network's forward()
method. I'll see you in the next one!
quiz
resources
updates
Committed by on