Reinforcement Learning - Introducing Goal Oriented Intelligence

with deeplizard.

OpenAI Gym and Python for Q-learning - Reinforcement Learning Code Project

October 14, 2018 by

Blog

OpenAI Gym and Python set up for Q-learning

What’s up, guys? Over the next couple of posts, we’re going to be building and playing our very first game with reinforcement learning!

We’re going to use the knowledge we gained last time about Q-learning to teach an agent how to play a game called Frozen Lake. We’ll be using Python and OpenAI’s Gym toolkit to develop our algorithm. So let’s get to it!

OpenAI Gym

So, as mentioned we’ll be using Python and OpenAI Gym to develop our reinforcement learning algorithm. The Gym library is a collection of environments that we can use with the reinforcement learning algorithms we develop.

Gym has a ton of environments ranging from simple text based games to Atari games like Breakout and Space Invaders. The library is intuitive to use and simple to install. Just run pip install gym, and you’re good to go! The link to Gym’s installation instructions, requirements, and documentation is included in the description. Go ahead and get that installed now because we'll need it in just a moment.

We’ll be making use of Gym to provide us with an environment for a simple game called Frozen Lake. We’ll then train an agent to play the game using Q-learning, and we’ll get a playback of how the agent does after being trained.

So, let’s jump into the details for Frozen Lake!

Frozen Lake

I’ve grabbed the description of the game directly from Gym’s website. Let’s read through it together.

Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend. The surface is described using a grid like the following:

SFFF
FHFH
FFFH
HFFG

This grid is our environment where S is the agent’s starting point, and it’s safe. F represents the frozen surface and is also safe. H represents a hole, and if our agent steps in a hole in the middle of a frozen lake, well, that’s not good. Finally, G represents the goal, which is the space on the grid where the prized frisbee is located.

The agent can navigate left, right, up, and down, and the episode ends when the agent reaches the goal or falls in a hole. It receives a reward of one if it reaches the goal, and zero otherwise.

State Description Reward
S Agent’s starting point - safe 0
F Frozen surface - safe 0
H Hole - game over 0
G Goal - game over 1

Alright, so you got it? Our agent has to navigate the grid by staying on the frozen surface without falling into any holes until it reaches the frisbee. If it reaches the frisbee, it wins with a reward of plus one. If it falls in a hole, it loses and receives no points for the entire episode.

Cool! Let’s jump into the code!

Setting up Frozen Lake in code

Libraries

First we’re importing all the libraries we’ll be using. Not many, really... Numpy, gym, random, time, and clear_output from Ipython’s display.

import numpy as np
import gym
import random
import time
from IPython.display import clear_output

Creating the environment

Next, to create our environment, we just call gym.make() and pass a string of the name of the environment we want to set up. We'll be using the environment FrozenLake-v0. All the environments with their corresponding names you can use here are available on Gym’s website.

env = gym.make("FrozenLake-v0")

With this env object, we’re able to query for information about the environment, sample states and actions, retrieve rewards, and have our agent navigate the frozen lake. That’s all made available to us conveniently with Gym.

Creating the Q-table

We’re now going to construct our Q-table, and initialize all the Q-values to zero for each state-action pair.

Remember, the number of rows in the table is equivalent to the size of the state space in the environment, and the number of columns is equivalent to the size of the action space. We can get this information using using env.observation_space.n and env.action_space.n, as shown below. We can then use this information to build the Q-table and fill it with zeros.

action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))

If you're foggy about Q-tables at all, be sure to check out the earlier post where we covered all the details you need for Q-tables.

Alright, here’s our Q-table!

print(q_table)

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]

Initializing Q-learning parameters

Now, we’re going to create and initialize all the parameters needed to implement the Q-learning algorithm.

num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01

Let's step through each of these.

First, with num_episodes, we define the total number of episodes we want the agent to play during training. Then, with max_steps_per_episode, we define a maximum number of steps that our agent is allowed to take within a single episode. So, if by the one-hundredth step, the agent hasn’t reached the frisbee or fallen through a hole, then the episode will terminate with the agent receiving zero points.

Next, we set our learning_rate, which was mathematically shown using the symbol \(\alpha\) in the previous post. Then, we also set our discount_rate, as well, which was represented with the symbol \(\gamma\) previously.

Now, the last four parameters are all for related to the exploration-exploitation trade-off we talked about last time in regards to the epsilon-greedy policy. We’re initializing our exploration_rate to 1 and setting the max_exploration_rate to 1 and a min_exploration_rate to 0.01. The max and min are just bounds to how large or small our exploration rate can be. Remember, the exploration rate was represented with the symbol \(\epsilon\) when we discussed it previously.

Lastly, we set the exploration_decay_rate to 0.01 to determine the rate at which the exploration_rate will decay.

Now, all of these parameters can change. These are the parameters you’ll want to play with and tune yourself to see how they influence and change the performance of the algorithm.

Wrapping up

Speaking of which, in the next post, we’re going to jump right into the code that we'll write to implement the actual Q-learning algorithm for playing Frozen Lake.

For now, go ahead and make sure your environment is set up with Python and Gym and that you’ve got the initial code written that we went through so far.

Let me know in the comments if you were able to get everything up and running, and I’ll see ya in the next post where we’ll implement our first reinforcement learning algorithm!

Description

Welcome back to this series on reinforcement learning! Over the next couple of videos, we’re going to be building and playing our very first game with reinforcement learning in code! We’re going to use the knowledge we gained last time about Q-learning to teach a reinforcement learning agent how to play a game called Frozen Lake. We’ll be using Python and OpenAI’s Gym toolkit to develop our algorithm. Check out the corresponding blog and other resources for this video at: http://deeplizard.com/learn/video/QK_PP_2KgGE OpenAI Gym: https://gym.openai.com/docs/ Code: https://www.patreon.com/posts/22063269 Code files are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: http://deeplizard.com/hivemind ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Ruicong Xie Support collective intelligence, and join the deeplizard hivemind! Follow deeplizard: YouTube: https://www.youtube.com/deeplizard Twitter: https://twitter.com/deeplizard Facebook: https://www.facebook.com/Deeplizard-145413762948316 Steemit: https://steemit.com/@deeplizard Instagram: https://www.instagram.com/deeplizard/ Pinterest: https://www.pinterest.com/deeplizard/ Check out products deeplizard suggests on Amazon: https://www.amazon.com/shop/deeplizard Get a free Audible 30-day trial and 2 free audio books with deeplizard’s link: https://amzn.to/2yoqWRn Support deeplizard by browsing with Brave: https://brave.com/dee530 Support deeplizard with crypto: Bitcoin: 1AFgm3fLTiG5pNPgnfkKdsktgxLCMYpxCN Litecoin: LTZ2AUGpDmFm85y89PFFvVR5QmfX6Rfzg3 Ether: 0x9105cd0ecbc921ad19f6d5f9dd249735da8269ef Recommended books on AI: The Most Human Human: What Artificial Intelligence Teaches Us About Being Alive: http://amzn.to/2GtjKqu Life 3.0: Being Human in the Age of Artificial Intelligence https://amzn.to/2H5Iau4 Playlists: Data Science - https://www.youtube.com/playlist?list=PLZbbT5o_s2xrth-Cqs_R9-us6IWk9x27z Machine Learning - https://www.youtube.com/playlist?list=PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras - https://www.youtube.com/playlist?list=PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js - https://www.youtube.com/playlist?list=PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ- PyTorch - https://www.youtube.com/watch?v=v5cngxo4mIg&list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG Reinforcement Learning - https://www.youtube.com/playlist?list=PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv Music: Thinking Music by Kevin MacLeod Jarvic 8 by Kevin MacLeod YouTube: https://www.youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ Website: http://incompetech.com/ Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/