# Reinforcement Learning - Developing Intelligent Agents

Deep Learning Course 6 of 7 - Level: Advanced

## OpenAI Gym and Python for Q-learning - Reinforcement Learning Code Project

### video

expand_more chevron_left

### text

expand_more chevron_left

### OpenAI Gym and Python set up for Q-learning

What's up, guys? Over the next couple of posts, we're going to be building and playing our very first game with reinforcement learning!

We're going to use the knowledge we gained last time about Q-learning to teach an agent how to play a game called Frozen Lake. We'll be using Python and Gymnasium (previously known as OpenAI Gym), to develop our algorithm. So let's get to it!

### Gymnasium

As mentioned we'll be using Python and Gymnasium to develop our reinforcement learning algorithm. The Gym library is a collection of environments that we can use with the reinforcement learning algorithms we develop.

Gym has a ton of environments ranging from simple text based games to Atari games like Breakout and Space Invaders. The library is intuitive to use and simple to install. Just run pip install gymnasium, and you're good to go!

We'll be making use of Gym to provide us with an environment for a simple game called Frozen Lake. We'll then train an agent to play the game using Q-learning, and we'll get a playback of how the agent does after being trained.

So, let's jump into the details for Frozen Lake!

### Frozen Lake

I've grabbed the description of the game directly from Gym's website. Let's read through it together.

Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend. The surface is described using a grid like the following:

SFFF
FHFH
FFFH
HFFG


This grid is our environment where S is the agent's starting point, and it's safe. F represents the frozen surface and is also safe. H represents a hole, and if our agent steps in a hole in the middle of a frozen lake, well, that's not good. Finally, G represents the goal, which is the space on the grid where the prized frisbee is located.

The agent can navigate left, right, up, and down, and the episode ends when the agent reaches the goal or falls in a hole. It receives a reward of one if it reaches the goal, and zero otherwise.

State Description Reward
S Agent's starting point - safe 0
F Frozen surface - safe 0
H Hole - game over 0
G Goal - game over 1

Alright, so you got it? Our agent has to navigate the grid by staying on the frozen surface without falling into any holes until it reaches the frisbee. If it reaches the frisbee, it wins with a reward of plus one. If it falls in a hole, it loses and receives no points for the entire episode.

Cool! Let's jump into the code!

### Setting up Frozen Lake in code

The code we'll be working with largely follows Thomas Simonini's Frozen Lake Q-learning implementation with some slight modifications.

### Libraries

First we're importing all the libraries we'll be using. Not many, really... Numpy, gymnasium, random, time, and clear_output from Ipython's display.

import numpy as np
import gymnasium as gym
import random
import time
from IPython.display import clear_output


### Creating the environment

Next, to create our environment, we just call gym.make() and pass a string of the name of the environment we want to set up. We'll be using the environment FrozenLake-v1.

env = gym.make('FrozenLake-v1', render_mode='ansi')


With this env object, we're able to query for information about the environment, sample states and actions, retrieve rewards, and have our agent navigate the frozen lake. That's all made available to us conveniently with Gym.

### Creating the Q-table

We're now going to construct our Q-table, and initialize all the Q-values to zero for each state-action pair.

Remember, the number of rows in the table is equivalent to the size of the state space in the environment, and the number of columns is equivalent to the size of the action space. We can get this information using using env.observation_space.n and env.action_space.n, as shown below. We can then use this information to build the Q-table and fill it with zeros.

action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))


If you're foggy about Q-tables at all, be sure to check out the earlier post where we covered all the details you need for Q-tables.

Alright, here's our Q-table!

print(q_table)

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]


### Initializing Q-learning parameters

Now, we're going to create and initialize all the parameters needed to implement the Q-learning algorithm.

num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01


Let's step through each of these.

First, with num_episodes, we define the total number of episodes we want the agent to play during training. Then, with max_steps_per_episode, we define a maximum number of steps that our agent is allowed to take within a single episode. So, if by the one-hundredth step, the agent hasn't reached the frisbee or fallen through a hole, then the episode will terminate with the agent receiving zero points.

Next, we set our learning_rate, which was mathematically shown using the symbol $$\alpha$$ in the previous post. Then, we also set our discount_rate, as well, which was represented with the symbol $$\gamma$$ previously.

Now, the last four parameters are all for related to the exploration-exploitation trade-off we talked about last time in regards to the epsilon-greedy policy. We're initializing our exploration_rate to 1 and setting the max_exploration_rate to 1 and a min_exploration_rate to 0.01. The max and min are just bounds to how large or small our exploration rate can be. Remember, the exploration rate was represented with the symbol $$\epsilon$$ when we discussed it previously.

Lastly, we set the exploration_decay_rate to 0.01 to determine the rate at which the exploration_rate will decay.

Now, all of these parameters can change. These are the parameters you'll want to play with and tune yourself to see how they influence and change the performance of the algorithm.

### Wrapping up

Speaking of which, in the next post, we're going to jump right into the code that we'll write to implement the actual Q-learning algorithm for playing Frozen Lake.

For now, go ahead and make sure your environment is set up with Python and Gym and that you've got the initial code written that we went through so far.

Let me know in the comments if you were able to get everything up and running, and I'll see ya in the next post where we'll implement our first reinforcement learning algorithm!

### quiz

expand_more chevron_left

### resources

expand_more chevron_left

expand_more chevron_left