Learning in artificial neural networks
In this post, we’ll investigate what it means for an artificial neural network to learn.
In a previous post, we learned about the training process and saw that each data point used for training is passed through the network. This pass through the network from input to output is called a forward pass, and the resulting output depends on the weights at each connection inside the network.
Once all of the data points in our dataset have been passed through the network, we say that an epoch is complete.
Note that many epochs occur throughout the training process as the model learns.
What does it mean to learn?
So what exactly does it mean for the model to learn?
Well, remember, when the model is initialized, the network weights are set to arbitrary values. We have also seen that, at the end of the network, the model will provide the output for a given input.
Once the output is obtained, the loss (or the error) can be computed for that specific output by looking at what the model predicted versus the true label. The loss computation depends on the chosen loss function, which we'll cover in more detail in a later post.
Gradient of the loss function
After the loss is calculated, the gradient of this loss function is computed with respect to each of the weights within the network. Note, gradient is just a word for the derivative of a function of several variables.
Continuing with this explanation, let’s focus in on only one of the weights in the model.
At this point, we’ve calculated the loss of a single output, and we calculate the gradient of that loss with respect to our single chosen weight. This calculation is done using a technique called backpropagation, which is covered in full detail starting here.
Once we have the value for the gradient of the loss function, we can use this value to update the model’s weight. The gradient tells us which direction will move the loss towards the minimum, and our task is to move in a direction that lowers the loss and steps closer to this minimum value.
We then multiply the gradient value by something called a learning rate. A learning rate is a small number usually ranging between 0.01 and 0.0001, but the actual value can vary.
Just keep this in mind for now, and we’ll look more closely at learning rates in a future post.
Updating the weights
Alright, so we multiply the gradient with the learning rate, and we subtract this product from the weight, which will give us the new updated value for this weight.
In this discussion, we just focused on one single weight to explain the concept, but this same process is going to happen with each of the weights in the model each time data passes through it.
The only difference is that when the gradient of the loss function is computed, the value for the gradient is going to be different for each weight because the gradient is being calculated with respect to each weight.
So now imagine all these weights being iteratively updated with each epoch. The weights are going to be incrementally getting closer and closer to their optimized values while SGD works to minimize the loss function.
The model is learning
This updating of the weights is essentially what we mean when we say that the model is learning. It’s learning what values to assign to each weight based on how those incremental changes are affecting the loss function. As the weights change, the network is getting smarter in terms of accurately mapping inputs to the correct output.
Alright, now, with this explanation along with our last post on training, we should now have a general idea about what it means to train a model and how the model learns through this training process.
Let’s look now at how this training is done with code in Keras.
Training in code with Keras
In order to train the model, the first thing required of us is to build the model.
Let's begin by importing the required classes:
import keras from keras import backend as K from keras.models import Sequential from keras.layers import Activation from keras.layers.core import Dense from keras.optimizers import Adam from keras.metrics import categorical_crossentropy
Next, we define our model:
model = Sequential([ Dense(16, input_shape=(1,), activation='relu'), Dense(32, activation='relu'), Dense(2, activation='sigmoid') ])
Before we can train our model, we must compile it like so:
model.compile( Adam(lr=.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'] )
compile() function, we are passing the optimizer, the loss function, and the metrics that we would like to see. Notice that the optimizer we have specified is called
Adam is just a variant of SGD. Inside the
Adam constructor is where we specify the learning rate, and in this case
Adam(lr=.0001), we have chosen
Finally, we fit our model to the data. Fitting the model to the data means to train the model on the data. We do this with the following code:
model.fit( scaled_train_samples, train_labels, batch_size=10, epochs=20, shuffle=True, verbose=2 )
scaled_train_samples is a numpy array consisting of the training samples.
train_labels is a numpy array consisting of the corresponding labels for the training samples.
batch_size=10 specifies how many training samples should be sent to the model at once.
epochs=20 means that the complete training set (all of the samples) will be passed to the model a total of 20 times.
shuffle=True indicates that the data should first be shuffled before being passed to the model.
verbose=2 indicates how much logging we will see as the model trains.
Running this code gives us the following output:
Epoch 1/20 0s - loss: 0.6400 - acc: 0.5576 Epoch 2/20 0s - loss: 0.6061 - acc: 0.6310 Epoch 3/20 0s - loss: 0.5748 - acc: 0.7010 Epoch 4/20 0s - loss: 0.5401 - acc: 0.7633 Epoch 5/20 0s - loss: 0.5050 - acc: 0.7990 Epoch 6/20 0s - loss: 0.4702 - acc: 0.8300 Epoch 7/20 0s - loss: 0.4366 - acc: 0.8495 Epoch 8/20 0s - loss: 0.4066 - acc: 0.8767 Epoch 9/20 0s - loss: 0.3808 - acc: 0.8814 Epoch 10/20 0s - loss: 0.3596 - acc: 0.8962 Epoch 11/20 0s - loss: 0.3420 - acc: 0.9043 Epoch 12/20 0s - loss: 0.3282 - acc: 0.9090 Epoch 13/20 0s - loss: 0.3170 - acc: 0.9129 Epoch 14/20 0s - loss: 0.3081 - acc: 0.9210 Epoch 15/20 0s - loss: 0.3014 - acc: 0.9190 Epoch 16/20 0s - loss: 0.2959 - acc: 0.9205 Epoch 17/20 0s - loss: 0.2916 - acc: 0.9238 Epoch 18/20 0s - loss: 0.2879 - acc: 0.9267 Epoch 19/20 0s - loss: 0.2848 - acc: 0.9252 Epoch 20/20 0s - loss: 0.2824 - acc: 0.9286
The output gives us the following values for each epoch:
- Epoch number
- Duration in seconds
What you will notice is that the loss is going down and the accuracy is going up as the epochs progress.
This is the general method for training models in Keras. I hope you now have a general understanding of the training process, how our models learn, and how this can be done in code with Keras. I'll see you in the next one!