Watch Q-learning Agent Play Game with Python - Reinforcement Learning Code Project

video

expand_more

text

expand_more

Watch Q-learning agent play Frozen Lake

What's up, guys? In this post, we'll write the code to enable us to watch our trained Q-learning agent play Frozen Lake, so let's get to it!

Last time, we left off having just finished training our Q-learning agent to play Frozen Lake. We trained it for 10,000 episodes, and now it's time to see our agent on the ice in action!

The code to watch the agent play the game

This block of code is going to allow us to watch our trained agent play Frozen Lake using the knowledge it's gained from the training we completed.

# Watch our agent play Frozen Lake by playing the best action 
# from each state according to the Q-table

for episode in range(3):
    # initialize new episode params

    for step in range(max_steps_per_episode):        
        # Show current state of environment on screen
        # Choose action with highest Q-value for current state       
        # Take new action

        if done:
            if reward == 1:
                # Agent reached the goal and won episode
            else:
                # Agent stepped in a hole and lost episode            

        # Set new state

env.close()

We're going to watch the agent play three episodes. Let's look at the start of the outer loop first.

For each episode

for episode in range(3):
    state = env.reset()[0]
    done = False
    print("*****EPISODE ", episode+1, "*****\n\n\n\n")
    time.sleep(1)
    ...

For each of the three episodes, we first reset the state of our environment, and set done to False. This variable is used for the same purpose as we saw in our training loop last time. It just keeps track whether or not our last action ended the episode.

We then just print to the console what episode we're starting, and we sleep for one second so that we have time to actually read that printout before it disappears from the screen.

Now, we'll move on to the inner loop.

For each time-step

for step in range(max_steps_per_episode):        
    clear_output(wait=True)
    print(env.render())
    time.sleep(0.3)
    ...

For each time-step within the episode, we're calling the iPython display function clear_output(), which clears the output from the current cell in the Jupyter notebook. With wait=True, it waits to clear the output until there is another printout to overwrite it. This is all just done so that the notebook and the screen display remain smooth as we watch our agent play.

We then call render() on our env object, which will render the current state of the environment to the display so that we can actually visually see the game grid and where our agent is on the grid. We then sleep again for 300 milliseconds to give us time to actually see the current state of the environment on screen before moving on to the next time step. Don't worry, this will all come together once we view the final product.

action = np.argmax(q_table[state,:])        
new_state, reward, done, truncated, info = env.step(action)

We then set our action to be the action that has the highest Q-value from our Q-table for our current state, and then we take that action with env.step(), just like we saw during training. This will update our new_state, the reward for our action, and whether or not the action completed the episode.

if done:
    clear_output(wait=True)
    print(env.render())
    if reward == 1:
        print("****You reached the goal!****")
        time.sleep(3)
    else:
        print("****You fell through a hole!****")
        time.sleep(3)
        clear_output(wait=True)
    break

If our action did end the episode, then we render the environment to see where the agent ended up from our last time-step. If the reward for that action was a 1, then we know that the episode ended because the agent reached the frisbee and won the game. So we print that info to the console. If the reward wasn't a 1, then we know it was alternatively a 0 and that the agent fell through a hole.

After seeing how the episode ended, we then start a new episode.

Now, if the last action didn't complete the episode, then we skip over the conditional, transition to the new state, and move on to the next time step.

state = new_state

After all three episodes are done, we then close the environment, and that's it!

env.close()

Watching the agent play

Alright, now in the video, we run this code and watch the agent play! Here's what we expect. We'll have our grid printed to the screen, the agent will start in the starting state in the top left corner of the grid, and we'll be able to see the actions chosen by the agent displayed above the grid at each time step. We'll also see the agent move around the grid, as indicated with a red marker.

Remember when we introduced Frozen Lake, part of the description noted that the agent won't always take the action that it chooses to take because, since the ice is slippery, even if we choose to go right, for example, we may slip and go up instead. So keep this in mind as you watch the agent play because you may see the chosen action show as right but then see the agent take a step up, for example. The slippery ice is the reason for this.

Now check out the agent play in the video!

Wrapping up

Alright, that's it! Pretty sweet for our first implementation of reinforcement learning in code, right? If you were able to follow along with the code for this entire implementation of Frozen Lake, then you should definitely feel good, and give yourself a pat on the back!

We'll continue to gain exposure to more difficult and sophisticated games as we progress deeper into reinforcement learning, so if you thought playing Frozen Lake was cool, you'll definitely want to stick around!

In the next post, we're going to become enlightened on how this Q-learning algorithm that uses value iteration, like what we used for Frozen Lake, may not be the absolute best approach, especially when we're dealing in large state-spaces. There, we'll see what we can do to make huge efficiency advances.

Hm… I wonder if neural networks might start to show up sometime soon. Are you able to see where they might be able to fit in in the scheme of what we've learned so far? Let me know any and all your ideas in the comments. I'll see ya in the next one!

quiz

expand_more

resources

expand_more

Welcome back to this series on reinforcement learning! In this video, we'll write the code to enable us to watch our trained Q-learning agent play Frozen Lake. We'll continue using Python and OpenAI Gym for this task. Last time, we left off having just finished training our Q-learning agent to play Frozen Lake, so now it's time to see our agent on the ice in action! Sources: Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow http://incompleteideas.net/book/RLbook2020.pdf Playing Atari with Deep Reinforcement Learning by Deep Mind Technologies https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf Thomas Simonini's Frozen Lake Q-learning implementation https://github.com/simoninithomas/Deep_reinforcement_learning_Course/tree/master/Q%20learning/FrozenLake Gymnasium: https://gymnasium.farama.org/ TED Talk: https://youtu.be/IjbTiRbeNpM 🕒🦎 VIDEO SECTIONS 🦎🕒 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 06:52 Collective Intelligence and the DEEPLIZARD HIVEMIND 💥🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎💥 👋 Hey, we're Chris and Mandy, the creators of deeplizard! 👀 CHECK OUT OUR VLOG: 🔗 https://youtube.com/deeplizardvlog 💪 CHECK OUT OUR FITNESS CHANNEL: 🔗 https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: 🔗 https://neurohacker.com/shop?rfsn=6488344.d171c6 ❤️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime 👀 Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard 🎓 Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd 🎓 Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y 🛒 Check out products deeplizard recommends on Amazon: 🔗 https://amazon.com/shop/deeplizard 📕 Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: 🔗 https://amzn.to/2yoqWRn 🎵 deeplizard uses music by Kevin MacLeod 🔗 https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❤️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more

DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

Updated
Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!

All relevant updates for the content on this page are listed below.

Reinforcement Learning - Developing Intelligent Agents