num_workers Test - Speed Things Up
Welcome to this neural network programming series. In this episode, we will see how we can speed up the neural network training process by utilizing the multiple process capabilities of the PyTorch DataLoader class.
Without further ado, let's get started.
Speeding Up the Training Process
To speed up the training process, we will make use of the
num_workers optional attribute of the
num_workers attribute tells the data loader instance how many sub-processes to use for data loading. By default, the
num_workers value is set to zero, and a value of zero
tells the loader to load the data inside the main process.
This means that the training process will work sequentially inside the main process. After a batch is used during the training process and another one is needed, we read the batch data from disk.
Now, if we have a worker process, we can make use of the fact that our machine has multiple cores. This means that the next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is where the speed up comes from. The batches are loaded using additional worker processes and are queued up in memory.
Optimal Value for the
The natural question that arises is, how many worker processes should we add? There are a lot of factors that can affect the optimal number here, so the best way to find out is to test.
Testing Values for the
To set up this test, we'll create a list of
num_workers values to try. We'll try the following values:
- 0 (default)
For each of these values, we'll vary the batch size by trying the following values:
For the learning rate, we'll keep it at a constant value of
.01 for all of the runs.
The last thing to mention about the setup here is the fact that we are only doing a single epoch for each of the runs.
Alright, let's see what we get.
num_workers Values: Results
Alright, we can see down below that we have the results. We completed a total of eighteen runs. We have three groups of differing batch sizes, and inside each of these groups, we varied the number of worker processes.
|run||epoch||loss||accuracy||epoch duration||run duration||lr||batch_size||num_workers|
The main take-away from these results is that, across all three batch sizes, having a single worker process in addition to the main process resulted in a speed up of about twenty percent.
Additionally, adding additional worker processes after the first one didn't really show any further improvements.
Interpreting the Results
The twenty percent speed up that we see after adding a single worker process makes sense because the main process had less work to do.
While the main process is busy performing the forward and backward passes, the worker process is loading the next batch. By the time the main process is ready for another batch, the worker process already has it queued up in memory.
As a result, the main process doesn't have to read the data from disk. Instead, the data is already in memory, and this gives us the twenty percent speed up.
Now, why are we not seeing additional speed ups after adding more workers?
Make It Go Faster with More Workers?
We'll if one worker is enough to keep the queue full of data for the main process, then adding more batches of data to the queue isn't going to do anything. This is what I think we are seeing here.
Just because we are adding more batches to the queue doesn't mean the batches are being processes faster. Thus, we are bounded by the time it takes to forward and backward propagate a given batch.
We can even see that things start bogging as we get to
Hope this helps speed you up!