Bias in an Artificial Neural Network explained | How bias impacts training
text
Bias in an Artificial Neural Network
Hey, what's going on everyone? In this episode, we're going to talk about bias.
Don't worry, we're not changing focus to discuss bias from a social or political standpoint. We'll save that for another series. Instead, we'll be specifically discussing the bias present within artificial neural networks.
Without further ado, let's get to it.
Background
When reading up on artificial neural networks, you may have come across the term bias. It's sometimes just referred to as bias. Other times you may see it referenced as bias nodes, bias neurons, or bias units within a neural network.
We're going to break this bias down and see what it's all about.
We'll first start out by discussing the most obvious question of, well, what is bias in an artificial neural network?
We'll then see, within a network, how bias is implemented.
Then, to hit the point home, we'll explore a simple example to illustrate the impact that bias has when introduced to a neural network.
Understanding Bias Inside Neural Networks
Let's get started by working to understand what exactly bias is inside neural networks.
What is Bias?
So, what is bias in an artificial neural network?
Well, first, when we talk about bias, we're talking about it on a per-neuron basis. We can think of each neuron as having its own bias term, and so the entire network will be made up of multiple biases.
Now, the values assigned to these biases are learnable, just like the weights. Just how stochastic gradient descent learns and updates the weights via backpropagation during training, SGD is also learning and updating the biases as well.
Now, conceptually, we can think of the bias at each neuron as having a role similar to that of a threshold. This is because the bias value is what's going to determine whether or not the activation output from a neuron is going to be propagated forward through the network.
In other words, the bias is determining whether or not, or by how much, a neuron will fire. It's letting us know when a neuron is meaningfully activated. As we'll see in a few moments, the addition of these biases ends up increasing the flexibility of a model to fit the given data.
Where Bias Fits In
Alright, so we have an idea of what bias is now, but where exactly does it fit into the scheme of things? As we've discussed in past episodes, we know how each neuron receives a weighted sum of input from the previous layer, and then that weighted sum gets passed to an activation function.
Well, the bias for a neuron is going to fit right in here within this process. What we do is, rather than pass the weighted sum directly to the activation function, we instead pass the weighted sum plus the bias term to the activation function.
Ok, well, what good is that?
Let's look at a simple example that illustrates the role of this bias term in action.
Example that Shows Bias in Action
Suppose we have a neural network that has an input layer with just two nodes. Suppose the first node has a value of \(1\), and the second node has a value of \(2\).
Now we're going to focus our attention on a single neuron within the first hidden layer that directly follows the input layer.
The activation function we will use for this first hidden layer is relu, and we're going to assign some randomly generated weights to our connections.
Now, let's see what the output of this node would be without introducing any bias.
The weighted sum that this node receives is given by
We pass this result to relu. We know that the value of relu at any given input will be the maximum of either zero or the input itself, and in our case, we have
With an activation output of zero, the neuron is considered to not be activated, or not firing. In fact, with relu, any neuron with a weighted sum of input is less than or equal to zero will not be firing, and so no information from these non-activated neurons will be passed forward through to the rest of the network.
Essentially, zero is the threshold here for the weighted sum in determining whether a neuron is firing or not.
Well, what if we want to shift our threshold? What if instead of zero, we determined that a neuron should fire if its input is greater than or equal to \(-1\)?
This is where bias comes into play.
Remember, we earlier said that the bias gets added to the weighted sum before being passed to the activation function. The value we assign to our bias is the opposite of this so called threshold value.
Continuing with our example, we want the threshold to move from \(0\) to \(-1\), right? The bias will then be the opposite of \(-1\), which is just \(1\).
The weighted sum of \(-0.35\) plus our bias of \(1\) equals \(0.65\).
Passing this value to relu, we can see that
The neuron is now considered to be firing.
The model now has a bit of increased flexibility in fitting the data since it now has a broader range in what values it considers as being activated or not.
We could also do the same process in the opposite direction to narrow what output values from neurons that we consider as being activated. For example, if we determined that a neuron should be considered activated when its output is greater than or equal to five, then our bias would be minus five.
Conclusion
Now, we explicitly choose and set our bias in our example. In practice, this isn't the case. Just as we don't explicitly choose and control the weights in a network, we don't explicitly choose and control the biases either.
Remember, the biases are learnable parameters within the network, just like weights. After the biases are initialized, with random numbers, or zeros, or really any other value, they will be updated during training, allowing our model to learn when to activate and when not to activate each neuron.
quiz
resources
updates
Committed by on