Reinforcement Learning - Introducing Goal Oriented Intelligence

with deeplizard.

OpenAI Gym and Python for Q-learning - Reinforcement Learning Code Project

October 14, 2018 by

Blog

OpenAI Gym and Python set up for Q-learning

What’s up, guys? Over the next couple of posts, we’re going to be building and playing our very first game with reinforcement learning!

We’re going to use the knowledge we gained last time about Q-learning to teach an agent how to play a game called Frozen Lake. We’ll be using Python and OpenAI’s Gym toolkit to develop our algorithm. So let’s get to it!

OpenAI Gym

So, as mentioned we’ll be using Python and OpenAI Gym to develop our reinforcement learning algorithm. The Gym library is a collection of environments that we can use with the reinforcement learning algorithms we develop.

Gym has a ton of environments ranging from simple text based games to Atari games like Breakout and Space Invaders. The library is intuitive to use and simple to install. Just run pip install gym, and you’re good to go! The link to Gym’s installation instructions, requirements, and documentation is included in the description. Go ahead and get that installed now because we'll need it in just a moment.

We’ll be making use of Gym to provide us with an environment for a simple game called Frozen Lake. We’ll then train an agent to play the game using Q-learning, and we’ll get a playback of how the agent does after being trained.

So, let’s jump into the details for Frozen Lake!

Frozen Lake

I’ve grabbed the description of the game directly from Gym’s website. Let’s read through it together.

Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend. The surface is described using a grid like the following:

SFFF
FHFH
FFFH
HFFG

This grid is our environment where S is the agent’s starting point, and it’s safe. F represents the frozen surface and is also safe. H represents a hole, and if our agent steps in a hole in the middle of a frozen lake, well, that’s not good. Finally, G represents the goal, which is the space on the grid where the prized frisbee is located.

The agent can navigate left, right, up, and down, and the episode ends when the agent reaches the goal or falls in a hole. It receives a reward of one if it reaches the goal, and zero otherwise.

State Description Reward
S Agent’s starting point - safe 0
F Frozen surface - safe 0
H Hole - game over 0
G Goal - game over 1

Alright, so you got it? Our agent has to navigate the grid by staying on the frozen surface without falling into any holes until it reaches the frisbee. If it reaches the frisbee, it wins with a reward of plus one. If it falls in a hole, it loses and receives no points for the entire episode.

Cool! Let’s jump into the code!

Setting up Frozen Lake in code

Libraries

First we’re importing all the libraries we’ll be using. Not many, really... Numpy, gym, random, time, and clear_output from Ipython’s display.

import numpy as np
import gym
import random
import time
from IPython.display import clear_output

Creating the environment

Next, to create our environment, we just call gym.make() and pass a string of the name of the environment we want to set up. We'll be using the environment FrozenLake-v0. All the environments with their corresponding names you can use here are available on Gym’s website.

env = gym.make("FrozenLake-v0")

With this env object, we’re able to query for information about the environment, sample states and actions, retrieve rewards, and have our agent navigate the frozen lake. That’s all made available to us conveniently with Gym.

Creating the Q-table

We’re now going to construct our Q-table, and initialize all the Q-values to zero for each state-action pair.

Remember, the number of rows in the table is equivalent to the size of the state space in the environment, and the number of columns is equivalent to the size of the action space. We can get this information using using env.observation_space.n and env.action_space.n, as shown below. We can then use this information to build the Q-table and fill it with zeros.

action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))

If you're foggy about Q-tables at all, be sure to check out the earlier post where we covered all the details you need for Q-tables.

Alright, here’s our Q-table!

print(q_table)

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]

Initializing Q-learning parameters

Now, we’re going to create and initialize all the parameters needed to implement the Q-learning algorithm.

num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01

Let's step through each of these.

First, with num_episodes, we define the total number of episodes we want the agent to play during training. Then, with max_steps_per_episode, we define a maximum number of steps that our agent is allowed to take within a single episode. So, if by the one-hundredth step, the agent hasn’t reached the frisbee or fallen through a hole, then the episode will terminate with the agent receiving zero points.

Next, we set our learning_rate, which was mathematically shown using the symbol \(\alpha\) in the previous post. Then, we also set our discount_rate, as well, which was represented with the symbol \(\gamma\) previously.

Now, the last four parameters are all for related to the exploration-exploitation trade-off we talked about last time in regards to the epsilon-greedy policy. We’re initializing our exploration_rate to 1 and setting the max_exploration_rate to 1 and a min_exploration_rate to 0.01. The max and min are just bounds to how large or small our exploration rate can be. Remember, the exploration rate was represented with the symbol \(\epsilon\) when we discussed it previously.

Lastly, we set the exploration_decay_rate to 0.01 to determine the rate at which the exploration_rate will decay.

Now, all of these parameters can change. These are the parameters you’ll want to play with and tune yourself to see how they influence and change the performance of the algorithm.

Wrapping up

Speaking of which, in the next post, we’re going to jump right into the code that we'll write to implement the actual Q-learning algorithm for playing Frozen Lake.

For now, go ahead and make sure your environment is set up with Python and Gym and that you’ve got the initial code written that we went through so far.

Let me know in the comments if you were able to get everything up and running, and I’ll see ya in the next post where we’ll implement our first reinforcement learning algorithm!

Description

Welcome back to this series on reinforcement learning! Over the next couple of videos, we’re going to be building and playing our very first game with reinforcement learning in code! We’re going to use the knowledge we gained last time about Q-learning to teach a reinforcement learning agent how to play a game called Frozen Lake. We’ll be using Python and OpenAI’s Gym toolkit to develop our algorithm. OpenAI Gym: https://gym.openai.com/docs/ 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👀 OUR VLOG: 🔗 https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og 👉 Check out the blog post and other resources for this video: 🔗 https://deeplizard.com/learn/video/QK_PP_2KgGE 💻 DOWNLOAD ACCESS TO CODE FILES 🤖 Available for members of the deeplizard hivemind: 🔗 https://www.patreon.com/posts/27743395 🧠 Support collective intelligence, join the deeplizard hivemind: 🔗 https://deeplizard.com/hivemind 🤜 Support collective intelligence, create a quiz question for this video: 🔗 https://deeplizard.com/create-quiz-question 🚀 Boost collective intelligence by sharing this video on social media! ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: yasser Prash 👀 Follow deeplizard: Our vlog: https://www.youtube.com/channel/UC9cBIteC3u7Ee6bzeOcl_Og Twitter: https://twitter.com/deeplizard Facebook: https://www.facebook.com/Deeplizard-145413762948316 Patreon: https://www.patreon.com/deeplizard YouTube: https://www.youtube.com/deeplizard Instagram: https://www.instagram.com/deeplizard/ 🎓 Other deeplizard courses: Reinforcement Learning - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv NN Programming - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG DL Fundamentals - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- Data Science - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xrth-Cqs_R9- Trading - https://deeplizard.com/learn/playlist/PLZbbT5o_s2xr17PqeytCKiCD-TJj89rII 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://www.amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard’s link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ 🔗 http://incompetech.com/ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.