# Neural Network Programming - Deep Learning with PyTorch

Deep Learning Course 3 of 5 - Level: Intermediate

## Batch Norm in PyTorch - Add Normalization to Conv Net Layers

### video

expand_more chevron_left

### text

expand_more chevron_left

### Batch Normalization in PyTorch

Welcome to deeplizard. My name is Chris. In this episode, we're going to see how we can add batch normalization to a PyTorch CNN.

Without further ado, let's get started.

### What is Batch Normalization?

In order to understand batch normalization, we need to first understand what data normalization is in general, and we learned about this concept in the episode on dataset normalization.

When we normalize a dataset, we are normalizing the input data that will be passed to the network, and when we add batch normalization to our network, we are normalizing the data again after it has passed through one or more layers.

One question that may come to mind is the following:

Why normalize again if the input is already normalized?

Well, as the data begins moving though layers, the values will begin to shift as the layer transformations are preformed. Normalizing the outputs from a layer ensures that the scale stays in a specific range as the data flows though the network from input to output.

The specific normalization technique that is typically used is called standardization. This is where we calculate a z-score using the mean and standard deviation.

$z=\frac{x-mean}{std}$

#### How Batch Norm Works

When using batch norm, the mean and standard deviation values are calculated with respect to the batch at the time normalization is applied. This is opposed to the entire dataset, like we saw with dataset normalization.

Additionally, there are two learnable parameters that allow the data the data to be scaled and shifted. We saw this in the paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Note that the scaling given by $$\gamma$$ corresponds to the multiplication operation, and the sifting given by $$\beta$$ corresponds to the addition operation.

The Scale and sift operations sound fancy, but they simply mean multiply and add.

These learnable parameters give the distribution of values more freedom to move around, adjusting to the right fit.

The scale and sift values can be thought of as the slope and y-intercept values of a line, both which allow the line to be adjusted to fit various locations on the 2D plane.

### Adding Batch Norm to a CNN

Alright, let's create two networks, one with batch norm and one without. Then, we'll test these setups using the testing framework we've developed so far in the course. To do this, we'll make use of the nn.Sequential class.

Our first network will be called network1:

torch.manual_seed(50)
network1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.Flatten(start_dim=1)
, nn.Linear(in_features=12*4*4, out_features=120)
, nn.ReLU()
, nn.Linear(in_features=120, out_features=60)
, nn.ReLU()
, nn.Linear(in_features=60, out_features=10)
)


Our second network will be called network2:

torch.manual_seed(50)
network2 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.BatchNorm2d(6)
, nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
, nn.ReLU()
, nn.MaxPool2d(kernel_size=2, stride=2)
, nn.Flatten(start_dim=1)
, nn.Linear(in_features=12*4*4, out_features=120)
, nn.ReLU()
, nn.BatchNorm1d(120)
, nn.Linear(in_features=120, out_features=60)
, nn.ReLU()
, nn.Linear(in_features=60, out_features=10)
)


Now, we'll create a networks dictionary that we'll use to store the two networks.

networks = {
'no_batch_norm': network1
,'batch_norm': network2
}


The names or keys of this dictionary will be used inside our run loop to access each network. To configure our runs, we can use the keys of the dictionary opposed to writing out each value explicity. This is pretty cool because it allows us to easily test different networks with one another simply by adding more networks to the dictionary. 😎

params = OrderedDict(
lr = [.01]
, batch_size = [1000]
, num_workers = [1]
, device = ['cuda']
, trainset = ['normal']
, network = list(networks.keys())
)


Now, all we do inside our run loop is simply access the network using the run object that gives us access to the dictionary of networks. It's like this:

network = networks[run.network].to(device)


Boom! We're ready to test. The results look like this:

run epoch loss accuracy epoch duration run duration lr batch_size num_workers device trainset network
2 20 0.1636 0.9377 9.5200 196.9300 0.0100 1000 1 cuda normal batch_norm
2 19 0.1716 0.9335 9.5300 187.2900 0.0100 1000 1 cuda normal batch_norm
2 18 0.1757 0.9315 9.6400 177.6500 0.0100 1000 1 cuda normal batch_norm
2 17 0.1799 0.9311 9.5700 167.8900 0.0100 1000 1 cuda normal batch_norm
2 16 0.1865 0.9285 9.6200 158.2000 0.0100 1000 1 cuda normal batch_norm
2 15 0.1932 0.9266 9.6100 148.4700 0.0100 1000 1 cuda normal batch_norm
2 14 0.1978 0.9252 9.6800 138.7500 0.0100 1000 1 cuda normal batch_norm
2 13 0.2075 0.9214 9.5400 128.9700 0.0100 1000 1 cuda normal batch_norm
2 12 0.2087 0.9209 9.5500 119.3200 0.0100 1000 1 cuda normal batch_norm
2 11 0.2151 0.9197 9.5800 109.6600 0.0100 1000 1 cuda normal batch_norm
2 10 0.2240 0.9156 9.7100 99.9800 0.0100 1000 1 cuda normal batch_norm
1 20 0.2254 0.9150 9.6600 196.2000 0.0100 1000 1 cuda normal no_batch_norm
2 9 0.2304 0.9133 9.6600 90.1600 0.0100 1000 1 cuda normal batch_norm
1 19 0.2315 0.9130 9.6700 186.4500 0.0100 1000 1 cuda normal no_batch_norm

Batch norm smoked the competition and gave us the highest accuracy we've seen yet.

### quiz

expand_more chevron_left

### resources

expand_more chevron_left
In this episode, we're going to see how we can add batch normalization to a convolutional neural network. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 What is Batch Norm? 04:04 Creating Two CNNs Using nn.Sequential 09:42 Preparing the Training Set 10:45 Injecting Networks Into Our Testing Framework 14:55 Running the Tests - BatchNorm vs. NoBatchNorm 16:30 Dealing with Error Caused by TensorBoard 19:49 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 👉 Check out the blog post and other resources for this video: 🔗 https://deeplizard.com/learn/video/bCQ2cNhUWQ8 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 https://deeplizard.com/resources 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 https://deeplizard.com/hivemind 🤜 Support collective intelligence, create a quiz question for this video: 🔗 https://deeplizard.com/create-quiz-question 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Tammy Prash Zach Wimpee 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: Fundamental Concepts - https://deeplizard.com/learn/video/gZmobeGL0Yg Beginner Code - https://deeplizard.com/learn/video/RznKVRTFkBY Intermediate Code - https://deeplizard.com/learn/video/v5cngxo4mIg Advanced Deep RL - https://deeplizard.com/learn/video/nyjbcRQ-uQ8 🎓 Other Courses: Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ 🔗 http://incompetech.com/ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

expand_more chevron_left