Data Augmentation with TensorFlow's Keras API

video

expand_more

text

expand_more

Performing data augmentation with TensorFlow's Keras API

In this episode, we'll demonstrate how to use data augmentation on images using TensorFlow's Keras API.

Data augmentation occurs when new data is created based on modifications of existing data. We'll touch on the concept of data augmentation a bit more before we jump into the code, but for a more thorough presentation of the concept, check out the data augmentation episode from the Deep Learning Fundamentals course.

In our case, the data we'll work with will be images. For image data specifically, data augmentation could consist of things like flipping the image horizontally or vertically, rotating the image, zooming in or out, cropping, or varying the color.

Why do we need data augmentation?

For starters, it will help us obtain more data for training. Maybe we have a small training set, or maybe we just want to make our training set larger. We can do that by augmenting our existing data and then adding that data to the training set.

Another reason to use data augmentation is to reduce overfitting.

Performing data augmentation in code

Let's now see how we can perform data augmentation using Keras.

First, we import all the libraries we'll be using.

import matplotlib.pyplot as plt
import numpy as np
import os
import random
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
%matplotlib inline

Next, we'll use this plotImages() function obtained from TensorFlow's documentation to plot the processed images within our Jupyter notebook.

def plotImages(images_arr):
    fig, axes = plt.subplots(1, 10, figsize=(20,20))
    axes = axes.flatten()
    for img, ax in zip( images_arr, axes):
        ax.imshow(img)
        ax.axis('off')
    plt.tight_layout()
    plt.show()

We'll now define this variable called gen as an ImageDataGenerator. All the parameters being passed are the different ways we're telling Keras to augment the image.

gen = ImageDataGenerator(rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, shear_range=0.15, zoom_range=0.1, 
    channel_shift_range=10., horizontal_flip=True)

Check out the documentation to understand the units used for each augmentation technique we've specified. For example, rotation_range is measured in degrees, while width_shift_range is measured as a fraction of the width of the image.

Next, we choose a random image of a dog from disk.

chosen_image = random.choice(os.listdir('data/dogs-vs-cats/train/dog'))

We then create a variable called image_path and set that to the relative location on disk of the chosen image.

image_path = 'data/dogs-vs-cats/train/dog/' + chosen_image

Note, to follow along, you will need to point to a valid location and image file on your machine.

Next, we'll obtain the image by reading the image from disk by using plt.imread() and passing in the image_path. We also, expand the dimensions so that the image is compatible for how we'll use it later.

image = np.expand_dims(plt.imread(image_path),0)

Now, we'll plot the image just to see what the original image looks like.

plt.imshow(image[0])

Next, we'll generate batches of augmented images from the original image.

aug_iter = gen.flow(image)

The flow() function takes numpy data and generates batches of augmented data.

Now we'll get ten samples of the augmented images.

aug_images = [next(aug_iter)[0].astype(np.uint8) for i in range(10)]

Now we'll plot the augmented images.

plotImages(aug_images)

These are ten images that have been augmented from the original image according to the parameters we passed to the ImageDataGenerator earlier.

We can see that some of the images have been flipped horizontally, some have slight color variation, some are tilted slightly to the left or right, and some are shifted down or up slightly.

Save augmented data

Note, if you'd like to save these images so that you can add them to your training set, then to gen.flow(), you should also specify the parameter save_to_dir and set it equal to a valid location on disk.

You can optionally specify a prefix for which to prepend to file names of the saved augmented images, as well as optionally specify the file type as 'png' or 'jpeg' images. 'png' is the default.

aug_iter = gen.flow(image, save_to_dir='data/dogs-vs-cats/train/dog', save_prefix='aug-image-', save_format='jpeg')

Note, you can also use ImageDataGenerator.flow_from_directory() as opposed to ImageDataGenerator.flow() if you're wanting to generate batches of augmented data from data saved to an organized directory structure on disk. See the earlier episode where we introduced this function and showed the proper directory structure.

Hopefully now you understand what data augmentation is, why you'd want to use it, and how you can make use of it in Keras.

quiz

expand_more

resources

expand_more

In this episode, we'll demonstrate how to use data augmentation on images using TensorFlow's Keras API. 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:17 Introduction to Data Augmentation 01:32 Image Augmentation with Keras 08:16 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💪 CHECK OUT OUR FITNESS CHANNEL: 🔗 https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: 🔗 https://neurohacker.com/shop?rfsn=6488344.d171c6 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd 🎓 Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

TensorFlow - Python Deep Learning Neural Network API