Attempting to create complex strategies and countermeasures by letting AI repeatedly execute simple hide and seek

We are observing how

OpenAI gradually evolves to use complex tools by letting machine learning agents play simple hide and seek. AI played hide-and-seek repeatedly and developed six different strategies and countermeasures by self-study, saying, `` It is more complicated and intellectual It suggests that there is a possibility of producing a new movement. '

Emergent Tool Use from Multi-Agent Interaction

Clever hide-and-seek AIs learn to use tools and break the rules | TechCrunch

In hide-and-seek, he is in charge of both the role of hiding AI and the role of demon. The hidden role is shown in blue and the demon role is shown in red. The hidden role moves so that it hides from the vision of the demon role, and the demon role seems to move so as to capture the role hidden in the vision, but details about hide and seek are not input at all, and it is repeated many times While playing, each agent seems to have developed a strategy for hide and seek. In the area where hide and seek by AI is performed, there are some objects that can be used to hide, and the actual hide and seek environment is realistically reproduced.

Multi-Agent Hide and Seek-YouTube

The space where the agent plays hide and seek is like this. Walls and yellow objects that can be moved are placed, and the role (blue) and demon role (red) that hide inside move around. Hidden roles and demon roles can measure the distance from surrounding objects.

If you find a role that hides the demon role, the line of sight appears as a beam.

If you let the agent hide and hide without input, it will move completely randomly at first, but until the number of hide and seek reaches 2.69 million times, the hidden role will be hidden in the shadow of the object, the demon role will be hidden Will follow you.

And by 8.6 million times, you will learn how to move the object and fix the object at the entrance. In addition, only the hidden role can be fixed to the object, and the block once fixed is locked in place until the hidden role moves.

It is also possible to place objects at multiple entrances.

If you fix an object at the entrance, the demon role will not be able to find the hidden role. However, in the hide-and-seek from 8.26 million times to 14.5 million times, the demon moves a triangular prism-shaped object ...

Placed near the wall and used as a slope. It seems that it is now possible to discover hidden roles hidden in the room.

On the other hand, between 14.5 million and 43.4 million times, we learn to hide the slope in the room where they hide so that the hidden role is not used by the demon role.

In an environment where there are only a few walls, you will learn to chase the role where the demon role is hidden by 22 million times.

Learn how to skillfully use shelves from 22 million to 88 million times to form a shelter. On the other hand, the method that the demon uses the slope from 88 million times to 115 million times, the method to fix the slope, which is a further countermeasure, is between 115 million times and 388 million times Will be devised.

I thought that the slope was fixed and it was desperate, then the demon moves the cube ...

Move to the side of the slope.

Then get on the cube from the slope.

Move on the cube ...

Discover the hidden role in the shelter. It seems that this method of surfing on a cube was born between 388 million to 458 million times of hide and seek.

In addition, with the enforcement from 458 million times to 481 million times, the hidden role fixes all objects ...

Form a shelter. If a role that hides in this way creates a new method, a demon role creates a strategy for it and repeats this repeatedly to create a more advanced hide-and-seek.

For tasks such as navigating complex environments, it is very difficult for agents to create sophisticated movements even when humans design AI behavior because there are too many conditions to set. However, as shown in the experiment, it is clear that AI can create useful skills on its own by sharing different roles while learning AI models with

unsupervised learning.

OpenAI notes, “This result gives us confidence that in a more free and diverse environment, using multiple agents can lead to very complex and human-related behaviors.”

in Software,   Video, Posted by logu_ii