Reinforcement Learning - Developing Intelligent Agents

Deep Learning Course - Level: Advanced

Replay Memory Explained - Experience for Deep Q-Network Training

video

expand_more chevron_left

text

expand_more chevron_left

Replay Memory explained

What's up, guys? In this post, we'll continue our discussion of deep Q-networks and focus in on an important technique called experience replay that is utilized during the training process of a DQN. So, let's get to it!

Last time, we covered each piece of the architecture that makes up a typical deep Q-network. Now, before we can move on to discussing exactly how a DQN is trained, we're first going to explain the concepts of experience replay and replay memory.

Experience Replay and Replay Memory

With deep Q-networks, we often utilize this technique called experience replay during training. With experience replay, we store the agent's experiences at each time step in a data set called the replay memory. We represent the agent's experience at time \(t\) as \(e_t\).

At time \(t\), the agent's experience \(e_t\) is defined as this tuple:

$$e_t=(s_t,a_t,r_{t+1},s_{t+1})$$

This tuple contains the state of the environment \(s_t\), the action \(a_t\) taken from state \(s_t\), the reward \(r_{t+1}\) given to the agent at time \(t+1\) as a result of the previous state-action pair \((s_t,a_t)\), and the next state of the environment \(s_{t+1}\). This tuple indeed gives us a summary of the agent's experience at time \(t\).

drawing

All of the agent's experiences at each time step over all episodes played by the agent are stored in the replay memory. Well actually, in practice, we'll usually see the replay memory set to some finite size limit, \(N\), and therefore, it will only store the last \(N\) experiences.

This replay memory data set is what we'll randomly sample from to train the network. The act of gaining experience and sampling from the replay memory that stores these experience is called experience replay.

Why use experience replay?

Why would we choose to train the network on random samples from replay memory, rather than just providing the network with the sequential experiences as they occur in the environment?

A key reason for using replay memory is to break the correlation between consecutive samples.

If the network learned only from consecutive samples of experience as they occurred sequentially in the environment, the samples would be highly correlated and would therefore lead to inefficient learning. Taking random samples from replay memory breaks this correlation.

Combining a deep Q-network with experience replay

Alright, we now have the idea of experience replay down. From last time, we should also have an understanding of a general deep Q-network architecture, the data that the network accepts, and the output from the network.

As a quick refresher, remember that the network is passed a state from the environment, and in turn, the network outputs the Q-value for each action that can be taken from that state.

Let's now bring all of this information in together with experience replay to see how they fit in with each other.

Setting up

Before training starts, we first initialize the replay memory data set \(D\) to capacity \(N\). So, the replay memory \(D\) will hold \(N\) total experiences.

Next, we initialize the network with random weights. We've covered weight initialization in the Deep Learning Fundamentals series, so if you need a refresher on this topic, check that out. The exact same concepts we covered there applies for deep Q-network weight initialization.

Next, for each episode, we initialize the starting state of the episode. In our previous discussion, we talked about states, including the starting state, being a frame of raw pixels from a game screen as an example.

Gaining experience

Now, for each time step \(t\) within the episode, we either explore the environment and select a random action, or we exploit the environment and select the greedy action for the given state that gives the highest Q-value. Remember, this is the exploration-exploitation trade-off that we discussed in detail in a previous post.

We then execute the selected action \(a_t\) in an emulator. So, for example, if the selected action was to move right, then from an emulator where the actions were being executed in the actual game environment, the agent would actually move right. We then observe the reward \(r_{t+1}\) given for this action, and we also observe the next state of the environment, \(s_{t+1}\). We then store the entire experience tuple \(e_t=(s_t,a_t,r_{t+1},s_{t+1})\) in replay memory \(D\).

Wrapping up

Here's a summary of what we have so far:

  1. Initialize replay memory capacity.
  2. Initialize the network with random weights.
  3. For each episode:
    1. Initialize the starting state.
    2. For each time step:
      1. Select an action.
        • Via exploration or exploitation
      2. Execute selected action in an emulator.
      3. Observe reward and next state.
      4. Store experience in replay memory.

In the next post, we're going to discover how exactly we sample from replay memory during training, as well as all the other details we need to know about training a DQN. Thanks for contributing to collective intelligence, and I'll see ya in the next one!

quiz

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Quiz Results

resources

expand_more chevron_left
Welcome back to this series on reinforcement learning! In this video, we'll continue our discussion of deep Q-networks. Before we can move on to discussing exactly how a DQN is trained, we're first going to explain the concepts of experience replay and replay memory, which are utilized during the training process. So, let's get to it! Sources: Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow http://incompleteideas.net/book/RLbook2020.pdf Playing Atari with Deep Reinforcement Learning by Deep Mind Technologies https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf JΓΌrgen Schmidhuber interview: https://youtu.be/zK_x3Ba2l5Q πŸ•’πŸ¦Ž VIDEO SECTIONS πŸ¦ŽπŸ•’ 00:00 Welcome to DEEPLIZARD - Go to deeplizard.com for learning resources 00:30 Help deeplizard add video timestamps - See example in the description 05:51 Collective Intelligence and the DEEPLIZARD HIVEMIND πŸ’₯🦎 DEEPLIZARD COMMUNITY RESOURCES 🦎πŸ’₯ πŸ‘‹ Hey, we're Chris and Mandy, the creators of deeplizard! πŸ‘€ CHECK OUT OUR VLOG: πŸ”— https://youtube.com/deeplizardvlog πŸ’ͺ CHECK OUT OUR FITNESS CHANNEL: πŸ”— https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA 🧠 Use code DEEPLIZARD at checkout to receive 15% off your first Neurohacker order: πŸ”— https://neurohacker.com/shop?rfsn=6488344.d171c6 ❀️🦎 Special thanks to the following polymaths of the deeplizard hivemind: Mano Prime πŸ‘€ Follow deeplizard: Our vlog: https://youtube.com/deeplizardvlog Fitness: https://www.youtube.com/channel/UCdCxHNCexDrAx78VfAuyKiA Facebook: https://facebook.com/deeplizard Instagram: https://instagram.com/deeplizard Twitter: https://twitter.com/deeplizard Patreon: https://patreon.com/deeplizard YouTube: https://youtube.com/deeplizard πŸŽ“ Deep Learning with deeplizard: AI Art for Beginners - https://deeplizard.com/course/sdcpailzrd Deep Learning Dictionary - https://deeplizard.com/course/ddcpailzrd Deep Learning Fundamentals - https://deeplizard.com/course/dlcpailzrd Learn TensorFlow - https://deeplizard.com/course/tfcpailzrd Learn PyTorch - https://deeplizard.com/course/ptcpailzrd Natural Language Processing - https://deeplizard.com/course/txtcpailzrd Reinforcement Learning - https://deeplizard.com/course/rlcpailzrd Generative Adversarial Networks - https://deeplizard.com/course/gacpailzrd Stable Diffusion Masterclass - https://deeplizard.com/course/dicpailzrd πŸŽ“ Other Courses: DL Fundamentals Classic - https://deeplizard.com/learn/video/gZmobeGL0Yg Deep Learning Deployment - https://deeplizard.com/learn/video/SI1hVGvbbZ4 Data Science - https://deeplizard.com/learn/video/d11chG7Z-xk Trading - https://deeplizard.com/learn/video/ZpfCK_uHL9Y πŸ›’ Check out products deeplizard recommends on Amazon: πŸ”— https://amazon.com/shop/deeplizard πŸ“• Get a FREE 30-day Audible trial and 2 FREE audio books using deeplizard's link: πŸ”— https://amzn.to/2yoqWRn 🎡 deeplizard uses music by Kevin MacLeod πŸ”— https://youtube.com/channel/UCSZXFhRIx6b0dFX3xS8L1yQ ❀️ Please use the knowledge gained from deeplizard content for good, not evil.

updates

expand_more chevron_left
deeplizard logo DEEPLIZARD Message notifications

Update history for this page

Did you know you that deeplizard content is regularly updated and maintained?

  • Updated
  • Maintained

Spot something that needs to be updated? Don't hesitate to let us know. We'll fix it!


All relevant updates for the content on this page are listed below.