Max Pooling vs No Max Pooling - Deep Learning Course
text
Max Pooling vs No Max Pooling
Welcome to deeplizard. My name is Chris. In this lesson, we're going to see how a neural network performs with and without max pooling.
Without further ado, let's get started.
Testing with and without Max Pooling
So far in this course, we've built a convolution neural network with max pooling, and we've been training on the Fashion MNIST dataset.
Since the testing framework we've built makes it easy to test various networks, we can easily train with and without max pooling operations and see how the results compare.
Let's start by creating the two network variants inside our NetworkFactory
class.
class NetworkFactory():
@staticmethod
def get_network(name):
if name == 'max_pool':
torch.manual_seed(50)
return nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.BatchNorm2d(6)
, nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.Flatten(start_dim=1)
, nn.Linear(in_features=12*4*4, out_features=120)
, nn.ReLU()
, nn.BatchNorm1d(120)
, nn.Linear(in_features=120, out_features=60)
, nn.ReLU()
, nn.Linear(in_features=60, out_features=10)
)
elif name == 'no_max_pool':
torch.manual_seed(50)
return nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
, nn.ReLU()
# , nn.MaxPool2d(kernel_size=2, stride=2)
, nn.BatchNorm2d(6)
, nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
, nn.ReLU()
# , nn.MaxPool2d(kernel_size=2, stride=2)
, nn.Flatten(start_dim=1)
, nn.Linear(in_features=12*20*20, out_features=120)
, nn.ReLU()
, nn.BatchNorm1d(120)
, nn.Linear(in_features=120, out_features=60)
, nn.ReLU()
, nn.Linear(in_features=60, out_features=10)
)
else:
return None
Note that everything is the same in the 'no_max_pool'
network except for the number of input features coming into the first linear layer. This is due to removal of the max pool operations that reduce the total number of features flowing through the network.
We now have 12*20*20
opposed to 12*4*4
, so we can see that the 'no_max_pool'
network has more output features coming from the convolutional part of the network.
We're ready now to test both of these networks with the following run configurations.
params = OrderedDict(
lr = [.01]
, batch_size = [1000]
, num_workers = [1]
, device = ['cuda']
, network = ['max_pool', 'no_max_pool']
)
Noe that we are simply specifying the name of the two networks in the network section of the run configurations. These names will be passed to the NetworkFactory
inside the run loop.
m = RunManager()
for run in RunBuilder.get_runs(params):
device = torch.device(run.device)
network = NetworkFactory.get_network(run.network).to(device)
loader = DataLoader(
train_set
, batch_size=run.batch_size
, num_workers=run.num_workers
)
optimizer = optim.Adam(network.parameters(), lr=run.lr)
m.begin_run(run, network, loader)
for epoch in range(20):
m.begin_epoch()
for batch in loader:
images = batch[0].to(device)
labels = batch[1].to(device)
preds = network(images) # Pass Batch
loss = F.cross_entropy(preds, labels) # Calculate Loss
optimizer.zero_grad() # Zero Gradients
loss.backward() # Calculate Gradients
optimizer.step() # Update Weights
m.track_loss(loss, batch)
m.track_num_correct(preds, labels)
m.end_epoch()
m.end_run()
m.save('results')
run | epoch | loss | accuracy | epoch duration | run duration | lr | batch_size | num_workers | device | network |
---|---|---|---|---|---|---|---|---|---|---|
2 | 20 | 0.053 | 0.979 | 5.6 | 118.3 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 19 | 0.056 | 0.978 | 5.6 | 112.4 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 18 | 0.058 | 0.977 | 5.6 | 106.6 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 17 | 0.066 | 0.974 | 5.6 | 100.9 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 16 | 0.070 | 0.972 | 5.8 | 95.0 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 15 | 0.088 | 0.966 | 6.0 | 89.0 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 13 | 0.094 | 0.963 | 5.6 | 76.8 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 14 | 0.096 | 0.963 | 5.8 | 82.8 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 12 | 0.103 | 0.961 | 5.6 | 71.0 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 11 | 0.120 | 0.954 | 5.6 | 65.2 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 10 | 0.133 | 0.949 | 5.6 | 59.4 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 9 | 0.144 | 0.946 | 5.6 | 53.6 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 8 | 0.160 | 0.940 | 5.6 | 47.9 | 0.01 | 1000 | 1 | cuda | no_max_pool |
2 | 7 | 0.175 | 0.934 | 5.6 | 42.1 | 0.01 | 1000 | 1 | cuda | no_max_pool |
1 | 20 | 0.173 | 0.934 | 5.5 | 119.7 | 0.01 | 1000 | 1 | cuda | max_pool |
1 | 19 | 0.178 | 0.932 | 5.5 | 113.7 | 0.01 | 1000 | 1 | cuda | max_pool |
1 | 16 | 0.182 | 0.932 | 5.6 | 96.4 | 0.01 | 1000 | 1 | cuda | max_pool |
1 | 18 | 0.177 | 0.932 | 5.5 | 107.0 | 0.01 | 1000 | 1 | cuda | max_pool |
Alright, we have the results ordered by accuracy. We can see that the 'no_max_pool'
network greatly outperformed the 'max_pool'
network.
Network | Best Accuracy |
---|---|
'no_max_pool' |
0.979 |
'max_pool' |
0.934 |
In order to understand why simply removing max pooling increased our network performance so much, we need to think about the characteristics of the dataset we are training on.
To help discover clues about how max pooling impacts our data, use the Max Pooling Demo Application with some of the samples from the Fashion MNIST dataset.
Furthermore, while using the demo app, consider that our network performed two max pooling operations opposed to a single operation that is used in the app.
quiz
resources
updates
Committed by on