Deep curiosity search: Intra-life exploration improves performance on challenging deep reinforcement problems

Author(s): 
Stanton C, Clune J
Year: 
2018
Abstract: 
Traditional exploration methods in reinforcement learning (RL) require agents to perform random actions to find rewards. But these approaches struggle on sparse-reward domains like Montezuma's Revenge where the probability that any random action sequence leads to reward is extremely low. Recent algorithms have performed well on such tasks by encouraging agents to visit new states or perform new actions in relation to all prior training episodes (which we call across-training novelty). But such algorithms do not consider whether an agent exhibits intra-life novelty: doing something new within the current episode, regardless of whether those behaviors have been performed in previous episodes. We hypothesize that across-training novelty might discourage agents from revisiting initially non-rewarding states that could become important stepping stones later in training—a problem remedied by encouraging intra-life novelty. We introduce Curiosity Search for deep reinforcement learning, or Deep Curiosity Search (DeepCS), which encourages intra-life exploration by rewarding agents for visiting as many different states as possible within each episode, and show that DeepCS matches the performance of current state-of-the-art methods on Montezuma's Revenge. We further show that DeepCS improves exploration on Gravitar (another difficult, sparse-reward game) and performs well on the dense-reward game Amidar. Surprisingly, DeepCS also doubles A2C performance on Seaquest, a game we would not have expected to benefit from intra-life exploration because the arena is small and already easily navigated by naive exploration techniques. In one run, DeepCS achieves a maximum training score of 80,000 points on Seaquest—higher than any methods other than Ape-X. The strong performance of DeepCS on these sparse- and dense-reward tasks suggests that encouraging intra-life novelty is an interesting, new approach for improving performance in Deep RL and motivates further research into hybridizing across-training and intra-life exploration methods.

Deep Curiosity Search Example Agents: Seaquest (Best)

Our best agent produced by Curiosity Search on the Atari game Seaquest. This agent achieves approximately 132,000 points, which is superior to that of many other Deep RL algorithms and is also vastly greater than what an average human can achieve!

Deep Curiosity Search Example Agents: Seaquest

A typical agent produced by Curiosity Search on the Atari game Seaquest. This agent achieves approximately 3400 points, which is double that of some popular Deep RL algorithms like DQN and A2C, although not as good as Rainbow or Ape-X.

Deep Curiosity Search Example Agents: Montezuma's Revenge (Best)

Our best agent produced by Curiosity Search on the very challenging Atari game Montezuma's Revenge. This agent achieves 6600 points and explores many rooms, whereas most popular Deep RL algorithms like DQN, A2C, and Rainbow struggle to even pick up the first key!

Deep Curiosity Search Example Agents: Montezuma's Revenge

A typical agent produced by Curiosity Search on the very challenging Atari game Montezuma's Revenge. This agent achieves 3500 points and explores many rooms, whereas most popular Deep RL algorithms like DQN, A2C, and Rainbow struggle to even pick up the first key!

Deep Curiosity Search Example Agents: Seaquest (Best)

Our best agent produced by Curiosity Search on the Atari game Seaquest. This agent achieves approximately 132,000 points, which is superior to that of many other Deep RL algorithms and is also vastly greater than what an average human can achieve!

Deep Curiosity Search Example Agents: Seaquest

A typical agent produced by Curiosity Search on the Atari game Seaquest. This agent achieves approximately 3400 points, which is double that of some popular Deep RL algorithms like DQN and A2C, although not as good as Rainbow or Ape-X.

Deep Curiosity Search Example Agents: Montezuma's Revenge (Best)

Our best agent produced by Curiosity Search on the very challenging Atari game Montezuma's Revenge. This agent achieves 6600 points and explores many rooms, whereas most popular Deep RL algorithms like DQN, A2C, and Rainbow struggle to even pick up the first key!

Deep Curiosity Search Example Agents: Montezuma's Revenge

A typical agent produced by Curiosity Search on the very challenging Atari game Montezuma's Revenge. This agent achieves 3500 points and explores many rooms, whereas most popular Deep RL algorithms like DQN, A2C, and Rainbow struggle to even pick up the first key!
Pub. Info: 
arXiv 1806.00553
BibTeX: