OpenAI announces a tool group `` Safety Gym '' for AI to perform reinforcement learning considering risks and costs in advance
In traditional reinforcement learning, agents repeat learning with repeated failures and collisions. However, this is only based on the principle of trial and error, and the agent is not considering whether the behavior is good or not.
OpenAI releases Safety Gym for reinforcement learning | VentureBeat
Safety Gym is a module designed for Reinforcement Learning Agents or AI where motivation towards the goal is maintained through rewards and punishment. OpenAI has introduced “Restricted Reinforcement Learning” in Safety Gym, where AI learns by performing simulations while automatically considering costs.
In constrained reinforcement learning, agents set a cost target at the start of learning and learn using rewards and punishment. In other words, in constrained reinforcement learning, AI is required to predict the risk in advance.
In Safety Gym, three agents 'Point', 'Car', and 'Doggo' are used to navigate the crowded environment and achieve the goal. In addition, three tasks are set: “Goal” that goes to the specified area, “Button” that passes through check points on the plane continuously, and “Push” that pushes the object to the specified location. There are two levels of task difficulty, and whenever the agent performs an unsafe action, a red warning light flashes around the agent.
In “Point”, a robot with one rotary actuator and an actuator for back and forth movement runs on a 2D plane.
'Car' moves a robot with two front wheels that drive independently and one rear wheel that rotates freely. In order for the Car robot to change direction and move, it is necessary to operate the two front wheels simultaneously.
“Doggo” is a simulation of a symmetrical robot with four legs. The foot may control the azimuth and elevation with respect to the moving object, and there must be one angle adjustment joint that must be operated so that the robot does not fall.
OpenAI says that Safety Gym is still under development, so much work is still needed to combine it with other problem settings and safety technologies, including 'improving performance' and 'safe transfer learning and distribution shift issues. Survey, and “Realization of restricted reinforcement learning combined with human preferences”.
OpenAI says, “A system like Safety Gym is also expected to make it easier for AI developers to work on systems that are open and shared, making it easier to collaborate on safety in the entire AI field.” It was.