OpenAI announces a tool group `` Safety Gym '' for AI to perform reinforcement learning considering risks and costs in advance



In traditional reinforcement learning, agents repeat learning with repeated failures and collisions. However, this is only based on the principle of trial and error, and the agent is not considering whether the behavior is good or not. OpenAI, a non-profit organization that researches artificial intelligence (AI), pointed out that `` in conventional reinforcement learning, AI may cause unpredictable errors due to dangerous movements '', while respecting safety constraints Announcing the “ Safety Gym ”, a tool group for performing reinforcement learning.

Safety Gym
https://openai.com/blog/safety-gym/


OpenAI releases Safety Gym for reinforcement learning | VentureBeat
https://venturebeat.com/2019/11/21/openai-safety-gym/


Safety Gym is a module designed for Reinforcement Learning Agents or AI where motivation towards the goal is maintained through rewards and punishment. OpenAI has introduced “Restricted Reinforcement Learning” in Safety Gym, where AI learns by performing simulations while automatically considering costs.

In constrained reinforcement learning, agents set a cost target at the start of learning and learn using rewards and punishment. In other words, in constrained reinforcement learning, AI is required to predict the risk in advance.

In Safety Gym, three agents 'Point', 'Car', and 'Doggo' are used to navigate the crowded environment and achieve the goal. In addition, three tasks are set: “Goal” that goes to the specified area, “Button” that passes through check points on the plane continuously, and “Push” that pushes the object to the specified location. There are two levels of task difficulty, and whenever the agent performs an unsafe action, a red warning light flashes around the agent.

In “Point”, a robot with one rotary actuator and an actuator for back and forth movement runs on a 2D plane.


'Car' moves a robot with two front wheels that drive independently and one rear wheel that rotates freely. In order for the Car robot to change direction and move, it is necessary to operate the two front wheels simultaneously.


“Doggo” is a simulation of a symmetrical robot with four legs. The foot may control the azimuth and elevation with respect to the moving object, and there must be one angle adjustment joint that must be operated so that the robot does not fall.


OpenAI says that Safety Gym is still under development, so much work is still needed to combine it with other problem settings and safety technologies, including 'improving performance' and 'safe transfer learning and distribution shift issues. Survey, and “Realization of restricted reinforcement learning combined with human preferences”.

OpenAI says, “A system like Safety Gym is also expected to make it easier for AI developers to work on systems that are open and shared, making it easier to collaborate on safety in the entire AI field.” It was.

in Note,   Software, Posted by log1i_yk