Reinforcement Learning - Introducing Goal Oriented Intelligence

with deeplizard.

Deep Q-Network Code Project Intro - Reinforcement Learning

June 25, 2019 by

Blog

Deep Q-network code project introduction

Welcome back to this series on reinforcement learning! It’s finally time to apply everything we’ve learned about deep Q-learning to implement our own deep Q-network in code! In this episode, we’ll get introduced to our reinforcement learning task at hand and go over the prerequisites needed to set up our environments to be ready to code. Let’s get to it!

Project overview

Alright, let’s jump right into what we’re going to be doing in our upcoming project. We’re going to be building and training a deep Q-network to learn to balance a pole on a moving cart. This is widely known as the cart and pole problem.

We’ll be using OpenAI’s Gym toolkit to set up our cart and pole environment.

Remember, Gym is what we used in previous episodes to train an agent to play Frozen Lake via value iteration. If you need an overview or refresher about Gym, be sure to check our earlier episode where we first introduced it.

The cart and pole problem consists of a cart that can move left and right along a frictionless track. The cart has a pole attached to the top of it, which starts out in a vertical upright position, however, by design, the pole will fall either to the left or right when not balanced. The goal here is to prevent this pole from falling over. A reward of \(+1\) will be given for each time step that the pole remains upright, and an episode will deemed over when the pole is more than \(15\) degrees from vertical or when the cart moves more than \(2.4\) units from the center of the screen.

So, essentially, the longer the pole remains upright without deviating too far from the center of the screen, the more reward our agent will get.

Just to get an idea of what would happen with no optimization for this cart and pole problem, along with no bounds for determining when an episode is over, below is a quick snippet of code that will run an instance of the cart and pole environment from Gym for \(1000\) time steps and will take a random action, either left or right, at each time step. We’ll render the environment at each step so we can see what this will look like.

import gym

env = gym.make('CartPole-v0')
env.reset()

for _ in range(1000):
    env.render()
    env.step(env.action_space.sample())
    env.close()

If you watch this code run in the video, or run it yourself, you’ll see that our deep Q-network will definitely have some learning to do so that it can balance the pole better than this!

Environment set up

Let’s now jump into how to get our environments set up so we can get started with building our DQN.

PyTorch

First things first, we’re going to be using PyTorch to build our deep Q-network. PyTorch is a neural network API written in Python, and if you haven’t worked with it yet, don’t worry! You’ll still be able to follow this project completely. We’ll be sure to cover everything regarding PyTorch as it comes up in our coding, but PyTorch is a great library, so I’ll encourage you to check it out if you haven’t yet. We have a full PyTorch video series with accompanying blogs on deeplizard.com that start from the absolute basics and guide you into building and training your own networks from scratch.

For our cart and pole problem, the code we’ll be working with will actually come from the deep Q-network tutorial directly from PyTorch’s website. There’s a fair amount of code there, so our goal will be to go over all of it in detail and break down how it follows the exact steps we fully outlined in previous episodes that detailed the process for how deep Q-networks work.

Note, I have made some minor tweaks and modifications to the code from the original PyTorch tutorial, but we’ll be exploring all of this fully in the upcoming episodes.

Anaconda

For my environment, I’ll be using Anaconda with Python version 3.7.3, and so aside from PyTorch and Gym, everything else we’ll need comes included with Anaconda.

To follow along using Anaconda as well, you will first need to install Anaconda, and you can see exactly how to do that on Anaconda’s website where they have the command or executable you’ll need depending on which operating system you’re using.

OpenAI Gym

After installing Anaconda, you’ll need to install Gym by simply using the command pip install gym.

Then, lastly, you’ll need to install PyTorch. I recommend for this step to check out our video or blog on how to quickly and easily install PyTorch. It will get PyTorch up and running for you in no time.

I’m going to be using a Windows environment without a GPU for this project, so the command I used to install PyTorch is conda install pytorch-cpu torchvision-cpu -c pytorch. I definitely recommend you check out the episode I just mentioned though since this command may be different for you depending on your environment.

Everything else we’ll need, like jupyter notebook, matplotlib, numpy, and pillow, for example, will already all be installed, as they come packaged with the Anaconda install.

Wrapping up

Alright, we’re now all set up to get started coding. We’ll jump right into that next time.

Let me know in the video comments if you were able to get everything installed and ready to go! By the way, did you know you can test your own understanding by taking quizzes after studying deeplizard content? Check it out on this page! See ya in the next one!

Description

Welcome back to this series on reinforcement learning! It’s finally time to apply everything we’ve learned about deep Q-learning to implement our own deep Q-network in code! In this episode, we’ll get introduced to our reinforcement learning task at hand and go over the prerequisites needed to set up our environments to be ready to code. Let’s get to it! 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👀 OUR VLOG: 🔗 https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og 👉 Check out the blog post and other resources for this video: 🔗 https://deeplizard.com/learn/video/FU-sNVew9ZA 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 https://www.patreon.com/posts/27743395 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 https://deeplizard.com/hivemind 🤜 Support collective intelligence, create a quiz question for this video: 🔗 https://deeplizard.com/create-quiz-question 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: yasser Prash 👀 Follow deeplizard: Our vlog: https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og Twitter: https://twitter.com/deeplizard Facebook: https://www.facebook.com/Deeplizard-145413762948316 Patreon: https://www.patreon.com/deeplizard YouTube: https://www.youtube.com/deeplizard Instagram: https://www.instagram.com/deeplizard/ 🎓 Deep Learning with deeplizard: Fundamental Concepts - https://deeplizard.com/learn/video/gZmobeGL0Yg Beginner Code - https://deeplizard.com/learn/video/RznKVRTFkBY Advanced Code - https://deeplizard.com/learn/video/v5cngxo4mIg Advanced Deep RL - https://deeplizard.com/learn/video/nyjbcRQ-uQ8 🎓 Other Courses: Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://www.amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard’s link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ 🔗 http://incompetech.com/ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.