With reinforcement learning with prediction-based rewards AI hits high difficulty degree death and gets high score more than human by gay


Random Network Distillation (RND) developed by OpenAI , established as a nonprofit research institute of AI, is a prediction-based method of learning reinforcement learning agents by searching the environment based on curiosity. Using this RND, OpenAI has successfully learned the agent which scores the score which exceeds the human average score in the game " Montezuma's Revenge ".

Reinforcement Learning with Prediction-Based Rewards
https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/

RND encourages agents to perform "situations in which (agents are not familiar)" such as difficult to predict what output a fixed random neural network will do. It is difficult to predict what will happen in "situation where (agent is not used to)", so the reward will be very large. RND can be applied to arbitrary reinforcement learning algorithms, is easy to implement, and has the feature that it can be efficiently expanded.

The reason why Montezuma's Revenge was chosen to ascertain the accuracy of RND is that DQN of self-game learning AI developed by Google could not exceed human's average score (4700). There are 24 rooms at the level 1 of Montezuma's Revenge, but DQN could only explore 15 rooms.

As a result of playing the game "Montezuma's Revenge" where the hero will soon die at Google's artificial intelligence "DQN" It became like this - GIGAZINE



On the other hand, AI agents learned using RND can search all 24 rooms at level 1, and it is possible to exceed the average score of human beings. When various AI agents play Montezuma's Revenge, the score is as shown in the graph below. The vertical axis shows the score, the horizontal axis shows the development time of the AI ​​agent, and only the RND exceeds the human average score (4700).



The way the AI ​​agent who learned using RND plays Montezuma's Revenge can be seen in the following movie.

Reinforcement Learning with Prediction-Based Rewards - YouTube


"Montezuma's Revenge" is a game in which Google's artificial intelligence "DQN" struggled to play, when the player falls from a high place on the stage or touches a skull, the hero dies. The AI ​​agent strengthened with RND showed a wide range of possibilities.



While going to the left or going to the right ...




I will get the key and go on the stage.



Pass through a thin scaffold so that it will not fall from high altitude, move with rope and steps.



Just collect not only the keys but also gather the jewels on the stage and increase the score.




Furthermore, when touching, the skillfully passed through the gap of the dead laser gate.



It is also possible to take a torch.



You can clear various levels.



Not just to clear the game, hero who dances with Skulls appearing on the stage. If you learn using RND, it seems that it will be possible to create an agent with a playful feeling which is unlikely to be AI.


in Software,   Video,   Game, Posted by logu_ii