Regularization in a Neural Network explained

video

expand_more

text

expand_more

Regularization in a neural network

In this post, we'll discuss what regularization is, and when and why it may be helpful to add it to our model.

In our previous post on overfitting, we briefly introduced dropout and stated that it is a regularization technique. We hadn't yet discussed what regularization is, so let's do that now.

In general, regularization is a technique that helps reduce overfitting or reduce variance in our network by penalizing for complexity. The idea is that certain complexities in our model may make our model unlikely to generalize well, even though the model fits the training data.

Regularization is a technique that helps reduce overfitting or reduce variance in our network by penalizing for complexity.

Given this, if we add regularization to our model, we're essentially trading in some of the ability of our model to fit the training data well for the ability to have the model generalize better to data it hasn't seen before.

To implement regularization is to simply add a term to our loss function that penalizes for large weights. We'll expand on this idea in just a moment.

L2 regularization

The most common regularization technique is called L2 regularization. We know that regularization basically involves adding a term to our loss function that penalizes for large weights.

L2 regularization term

With L2 regularization, the term we're adding to the loss is the sum of the squared norms of the weight matrices

$$\sum_{j=1}^{n}\left\Vert w^{[j]}\right\Vert ^{2},$$

multiplied by a small constant

$$\frac{\lambda }{2m}.$$

Norms are positive

If you're not familiar with norms in general, understand that a norm is just a function that assigns a strictly positive length or size for each vector in a vector space. The vector space we're working with here depends on the sizes of our weight matrices.

Rather than going on a linear algebra tangent about norms in this moment, we'll continue on with the general idea about regularization. Given that norms are a fundamental concept of linear algebra, there is a lot of information available on the web that explains norms in detail if you need to get a better grasp.

To over simplify, know for now that the norm of each of our weight matrices is just going to be a positive number.

Suppose that $v$ is a vector in a vector space. The norm of $v$ is denoted as $\left\Vert v\right\Vert,$ and it is required that

\[\left\Vert v\right\Vert \geq 0.\]

Adding the term to the loss

Let's look at what L2 regularization looks like. We have

$$loss + \left( \sum_{j=1}^{n}\left\Vert w^{[j]}\right\Vert ^{2}\right)\frac{\lambda }{2m}.$$

The table below gives the definition for each variable in the expression above.

Variable	Definition
$n$	Number of layers
$w^{[j]}$	Weight matrix for the $j^{th}$ layer
$m$	Number of inputs
$\lambda$	Regularization parameter

The term $\lambda$ is called the regularization parameter, and this is another hyperparameter that we'll have to choose and then test and tune in order to choose the correct number for our specific model.

To summarize, we now know that regularization is just a technique that penalizes for relatively large weights in our model, and behind the scenes, the implementation of regularization is just the addition of a term to our existing loss function.

Impact of regularization

So why does regularization help?

Well, using L2 regularization as an example, if we were to set $\lambda$ to be large, then it would incentivize the model to set the weights close to zero because the objective of SGD is to minimize the loss function. Remember our original loss function is now being summed with the sum of the squared matrix norms,

$$\sum_{j=1}^{n}\left\Vert w^{[j]}\right\Vert ^{2},$$

which is multiplied by

$$\frac{\lambda }{2m}.$$

If $\lambda$ is large, then this term, $\frac{\lambda }{2m}$, will continue to stay relatively large, and if we're multiplying that by the sum of the squared norms, then the product may be relatively large depending on how large our weights are. This means that our model is incentivized to make the weights small so that the value of this entire function stays relatively small in order to minimize loss.

Intuitively, we could think that maybe this technique will set the weights so close to zero, that it could basically zero-out or reduce the impact of some of our layers. If that's the case, then it would conceptually simplify our model, making our model less complex, which may in turn reduce variance and overfitting.

Wrapping up

We should now have a good understanding about what regularization is, its impact, and how L2 regularization works. See ya next time!

quiz

expand_more

resources

expand_more

In this video, we explain the concept of regularization in an artificial neural network and also show how to specify regularization in code with Keras. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 05:25 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💪 CHECK OUT OUR FITNESS CHANNEL: 🔗 https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: 🔗 https://neurohacker.com/shop?rfsn=6488344.d171c6 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd 🎓 Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

Deep Learning Fundamentals - Classic Edition

Regularization in a Neural Network explained

video

text

Regularization in a neural network

L2 regularization

L2 regularization term

Norms are positive

Adding the term to the loss

Impact of regularization

Wrapping up

quiz

Quiz Results

resources

updates

Update history for this page