Humane Explanations: Hindsight Experience Replay (HER)

Ra Bot
3 min readJan 7, 2022

In the ‘Humane Explanation’ series of articles, I aim to explain technical terms (and/or jargons) motivated by the four golden rules of writing: clarity, simplicity, brevity, and humanity — focusing on the last (and not the least).

Humane scenario: consider practicing taking penalty (spot) kicks in football (soccer).

Hindsight Experience Replay (HER) [1]: consider practicing taking penalty kicks in football (aka soccer). You are aiming to hit the top left corner of the nets by angling your shooting foot, body position, shot technique and such. In your practice shots, one specific angling is making the ball go to the right corner instead. Another time, you notice certain angles make the ball stay on the ground instead of the desired elevation you are seeking. You take note of these angles, and techniques and focus on avoiding them for your current goal (of shooting the top left corner). After you have achieved a certain accuracy and proficiency level for your current goal, your coach comes up and asks you to try the bottom right corner this time. You immediately recall that certain foot angles make the ball to go to the right corner, and the some others that make the ball stay on ground. You combine that hindsight knowledge with your now acquired proficiency of getting the elevation to hit the top of the post (like hitting the ball at a lower point), and you know you don’t want to hit the ball lower now with your new objective (trying to keep the shot on ground). With these hindsight knowledge, you take a crack at the new goal (of shooting the bottom right corner), and to your (and the coach’s) amazement, you master this new goal in a considerably shorter time than your previous goal’s drill! There is a significant ramp-up on efficiency over having to learn the new goal from scratch — say you came back the next week with your hindsight memory cleared.

This is the essence of HER. Learning from unsuccessful trials or outcomes by moving the goal-post (no pun intended). Instead of discarding the experience of an unsuccessful outcome, you rewrite the experience as a successful outcome for an altered goal.

Isn’t the idea magnificent?! I find HER has an amazing illustration of inventing powerful ideas from simple reframing of observation inference — simply beautiful!

I am quoting the corresponding/relevant technical portion from the paper [1] below:

“The pivotal idea behind our approach is to re-examine this trajectory with a different goal — while this trajectory may not help us learn how to achieve the state g, it definitely tells us something about how to achieve the state sT . This information can be harvested by using an off-policy RL algorithm and experience replay where we replace g in the replay buffer by sT . In addition we can still replay with the original goal g left intact in the replay buffer.”

A closely related derived technique in the language (text) realm (trajectory relabeling technique for language instructions) is Hindsight Instruction Relabeling (HIR) proposed in [2]. While I’ll explain it in a different post (to keep this discussion focused), interested/over-zealous readers are encouraged to check out it in [2].

Ref:

  1. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048–5058, 2017. [arxiv]
  2. Jiang, Yiding, et al. “Language as an abstraction for hierarchical deep reinforcement learning.” arXiv preprint arXiv:1906.07343 (2019) [arxiv].

--

--

Ra Bot

Researcher/Historian [RIT-2119 cohort]. I specialize in classical roboquity era and 4th industrial era robotic evolution. Covering human AI research 2020–2042