PyTorch - Python Deep Learning Neural Network API

Deep Learning Course 4 of 6 - Level: Intermediate

Training Multiple Networks - Deep Learning Course

video

expand_more chevron_left

text

expand_more chevron_left

Training Multiple PyTorch Networks at Once

Welcome to deeplizard. My name is Chris. In this lesson, we're going to see how we can train multiple PyTorch networks using the testing framework we've built throughout this course.

drawing

Without further ado, let's get started.

Single Network vs Multiple Networks

Up to this point, we have only trained a single network with multiple runs. At the start of each run, the network's weights were re-initialized when we called the Network() class constructor.

network = Network().to(device)

This allowed us to start fresh with each run and then compare the runs with one another. However, now that we have introduced multiple networks stored in a dictionary, we have created a problem.

The problem is that each network starts subsequent runs without resetting its respective weights. This is due to the fact that we are not re-initializing the networks when the runs begin. Instead, we are accessing each network stored inside the networks dictionary.

network = networks[run.network].to(device)

When we do this, we are getting the same network instance with weights that have already been updated in the trining loop for the previous run.

Training Multiple Networks with Multiple Runs

To fix this problem, we need to ensure that, for each network, the network weights are reset at the start of each run.

Reproducing the Issue

To reproduce the issue, we need to create a network and add it to the networks dictionary. This gives us the code below.

torch.manual_seed(50)
network = nn.Sequential(
      nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
    , nn.ReLU()
    , nn.MaxPool2d(kernel_size=2, stride=2)
    , nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
    , nn.ReLU()
    , nn.MaxPool2d(kernel_size=2, stride=2)
    , nn.Flatten(start_dim=1)  
    , nn.Linear(in_features=12*4*4, out_features=120)
    , nn.ReLU()
    , nn.Linear(in_features=120, out_features=60)
    , nn.ReLU()
    , nn.Linear(in_features=60, out_features=10)
)

train_set = torchvision.datasets.FashionMNIST(
      root='./data'
    , train=True
    , download=True
    , transform=transforms.Compose([transforms.ToTensor()])
)

networks = {
    'network': network
}

Furthermore, in the run parameters, we need to access the network using the dictionary. Finally, we need to ensure that at least two runs will be performed.

To do this, we'll add two learning rates.

params = OrderedDict(
    lr = [.01, .001]
    , batch_size = [1000]
    , num_workers = [1]
    , device = ['cuda']
    , network = list(networks.keys())
)

Now, we're ready to kick off the runs.

m = RunManager()
for run in RunBuilder.get_runs(params):

    device = torch.device(run.device)
    network = networks[run.network].to(device)
    loader = DataLoader(
          train_set
        , batch_size=run.batch_size
        , num_workers=run.num_workers
    )
    optimizer = optim.Adam(network.parameters(), lr=run.lr)
    
    m.begin_run(run, network, loader)
    for epoch in range(5):
        m.begin_epoch()
        for batch in loader:
            
            images = batch[0].to(device)
            labels = batch[1].to(device)
            preds = network(images) # Pass Batch
            loss = F.cross_entropy(preds, labels) # Calculate Loss
            optimizer.zero_grad() # Zero Gradients
            loss.backward() # Calculate Gradients
            optimizer.step() # Update Weights
            
            m.track_loss(loss, batch)
            m.track_num_correct(preds, labels)
        m.end_epoch()
    m.end_run()
m.save('results')

After completing, we have the following results:

run epoch loss accuracy epoch duration run duration lr batch_size num_workers device network
1 1 1.010 0.607 17.980 41.323 0.010 1000 1 cuda network
1 2 0.545 0.788 6.073 47.828 0.010 1000 1 cuda network
1 3 0.465 0.828 5.914 53.809 0.010 1000 1 cuda network
1 4 0.413 0.848 6.379 60.270 0.010 1000 1 cuda network
1 5 0.371 0.863 5.953 66.298 0.010 1000 1 cuda network
2 1 0.336 0.876 6.211 7.550 0.001 1000 1 cuda network
2 2 0.326 0.879 7.779 15.397 0.001 1000 1 cuda network
2 3 0.320 0.882 6.902 22.374 0.001 1000 1 cuda network
2 4 0.315 0.883 7.678 30.118 0.001 1000 1 cuda network
2 5 0.310 0.885 5.864 36.057 0.001 1000 1 cuda network

We can see by looking at run-two-epoch-one that the accuracy just continued increasing from where run-one left off. This is due to the fact that the network instance in the dictionary is the same for both runs. To fix this issue, our task is to reset the weights at the start of each run.

Fixing the Issue

To fix the issue by ensuring each network starts with fresh weights at the start of each run, we will swap out the dictionary with a network factory class. This factory class will have a get_network() function that we can call passing in the desired network's name. Then, in the run configurations, we'll simply specify a network name for each run.

Inside the network factory class is where the weight initialization will be handled. This means we will be using the network instance re-initialization method as our solution for resetting the network weights. Note that there are other solutions.

The NetworkFactory() class is a simple class. We have a single function called get_network() that returns a network based on its name, so the logic is a conditional block of if statements. The network factory is a class that handles the production of networks. It's like a factor that produces networks, hence, its name.

class NetworkFactory():
    @staticmethod
    def get_network(name):
        if name == 'network':
            torch.manual_seed(50)
            return nn.Sequential(
                nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
                , nn.ReLU()
                , nn.MaxPool2d(kernel_size=2, stride=2)
                , nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
                , nn.ReLU()
                , nn.MaxPool2d(kernel_size=2, stride=2)
                , nn.Flatten(start_dim=1)  
                , nn.Linear(in_features=12*4*4, out_features=120)
                , nn.ReLU()
                , nn.Linear(in_features=120, out_features=60)
                , nn.ReLU()
                , nn.Linear(in_features=60, out_features=10)
            )
        else:
            return None

With this factory class, we can create as many network variations as needed. The creation logic will be organized and contained inside the factory class.

To make use of the factory, we'll update our run configurations by removing the networks dictionary key values with network names that exist inside the factory. For testing purposes, we'll choose to work with a single network.

params = OrderedDict(
      lr = [.01, .001]
    , batch_size = [1000]
    , num_workers = [1]
    , device = ['cuda']
    , network = ['network']
)

Now, the last update we need to make is to swap the networks dictionary with a call to the NetworkFactory() inside the run loop. We'll call the get_network() function passing in the network's name from the run configuration.

network = NetworkFactory.get_network(run.network).to(device)

The key difference here is that the network will be re-initialized inside the factory at the start of each run, and this results in a fresh set of weights.

The fully updated loop looks like this:

m = RunManager()
for run in RunBuilder.get_runs(params):

    device = torch.device(run.device)
    network = NetworkFactory.get_network(run.network).to(device) # Factory call
    loader = DataLoader(
          train_set
        , batch_size=run.batch_size
        , num_workers=run.num_workers
    )
    optimizer = optim.Adam(network.parameters(), lr=run.lr)
    
    m.begin_run(run, network, loader)
    for epoch in range(5):
        m.begin_epoch()
        for batch in loader:
            
            images = batch[0].to(device)
            labels = batch[1].to(device)
            preds = network(images) # Pass Batch
            loss = F.cross_entropy(preds, labels) # Calculate Loss
            optimizer.zero_grad() # Zero Gradients
            loss.backward() # Calculate Gradients
            optimizer.step() # Update Weights
            
            m.track_loss(loss, batch)
            m.track_num_correct(preds, labels)
        m.end_epoch()
    m.end_run()
m.save('results')
run epoch loss accuracy epoch duration run duration lr batch_size num_workers device network
1 1 1.01 0.60 6.56 8.04 0.010 1000 1 cuda network
1 2 0.54 0.78 5.94 14.06 0.010 1000 1 cuda network
1 3 0.46 0.82 5.85 19.98 0.010 1000 1 cuda network
1 4 0.41 0.84 5.78 25.83 0.010 1000 1 cuda network
1 5 0.37 0.86 5.89 31.79 0.010 1000 1 cuda network
2 1 1.56 0.49 6.22 7.54 0.001 1000 1 cuda network
2 2 0.73 0.71 5.80 13.41 0.001 1000 1 cuda network
2 3 0.63 0.75 5.80 19.28 0.001 1000 1 cuda network
2 4 0.57 0.77 5.88 25.24 0.001 1000 1 cuda network
2 5 0.54 0.79 6.06 31.38 0.001 1000 1 cuda network

quiz

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Quiz Results

resources

expand_more chevron_left
In this lesson, we're going to see how we can train multiple networks using the testing framework we've developed. πŸ•’πŸ¦Ž VIDEO SECTIONS πŸ¦ŽπŸ•’ 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 What is Batch Norm? 04:04 Creating Two CNNs Using nn.Sequential 09:42 Preparing the Training Set 10:45 Injecting Networks Into Our Testing Framework 14:55 Running the Tests - BatchNorm vs. NoBatchNorm 16:30 Dealing with Error Caused by TensorBoard 19:49 Collective Intelligence and the DEEPLIZARD HIVEMIND πŸ’₯🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎πŸ’₯ πŸ‘‹ Hey, we're Chris and Mandy, the creators of deeplizard! πŸ‘€ CHECK OUT OUR VLOG: πŸ”— https://youtube.com/deeplizardvlog πŸ’» DOWNLOAD ACCESS TO CODE FILES πŸ€– Available for members of the deeplizard hivemind: πŸ”— https://deeplizard.com/resources ❀️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy BufferUnderrun Mano Prime πŸ‘€ Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard πŸŽ“ Deep Learning with deeplizard: Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/learn/video/RznKVRTFkBY Learn PyTorch - https://deeplizard.com/learn/video/v5cngxo4mIg Reinforcement Learning - https://deeplizard.com/learn/video/nyjbcRQ-uQ8 Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd πŸŽ“ Other Courses: Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y πŸ›’ Check out products deeplizard recommends on Amazon: πŸ”— https://amazon.com/shop/deeplizard πŸ“• Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: πŸ”— https://amzn.to/2yoqWRn 🎡 deeplizard uses music by Kevin MacLeod πŸ”— https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ πŸ”— http://incompetech.com/ ❀️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

  • Updated
  • Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!


All relevant updates for the content on this page are listed below.