Microsoft develops a transcendental AI to kick out the full score in Pac-Man

We focus on research on deep learning technology Microsoft acquired in January 2017Maluuba"Reinforcement learning andDivide and conquer law, We created an AI that can beat out 999,990 points that will be a full score in Pac-Man.

The theme of the research team was one of the Pac-Man series born in the United States in 1981 "Miss Pacman". Although Miz · Pacman was originally an arcade game, he said that he succeeded in recording the full score with a ported version to Atari 2600 which was born abroad.

Ever since MicrosoftClear the retro game with AIAlthough the attempt existed, due to the unpredictability possessed by Pacman, the game of Pac-Man series has never been perfectly captured by AI. Also, it is very difficult for humans to give full score in Pac-Man, the highest score of Miz Pac-Man of Atari 2600 version is "266330"was.

Meanwhile, AI created by Maluuba beat down overwhelming high score "999990". In development, we are using AI as a hybrid reward architecture. There are over 150 agents in total in AI, and finely divided tasks are allocated to this. Then, the top agent decides how to actually move Pac-Man by summarizing tasks by individual agents.

The best results appear when individual agents are "acting voluntarily" and top agents are focused on "best for the whole". This not only focuses on "how many agents are saying that they want to go in a particular direction", it also seems that the maximum results are obtained when focusing on the importance of that direction is. Harm Van Seijen, one of the developers about this, says "There was a wonderful interaction."

You can see how the AI ​​actually developed by Microsoft plays Pac-Man in the following movie.

The moment that achieves "999990" which becomes the full score at the beginning of the movie will be displayed.

The number of individual agents corresponds to each element on the screen. 154 power pellets

Four monsters

Four Ijke Monsters

There are 163 agents in total with one fruit.

The actual play screen. AI is set to earn large rewards by eating power pellets, small rewards, eating fruits and Ijke monsters. Also, being eaten by monsters by Pac-Man is set as a large negative reward, and they individually judge in which direction each agent on the screen should move by individually.

The judgment is the arrow that exists on the screen, and the top agent overturned them and finally determined the moving direction of Pac-Man. The direction of Pac-Man's movement is indicated by a large arrow at the bottom right of the screen.

The size of the arrows on the game screen is large and small, which is said to indicate the strength of priority.

According to Maluuba, AI applying the hybrid compensation architecture has a huge and practical application that helps predict corporate sales promotion and helps advance progress in natural language processing .

