The world's strongest go program · AlphaGo's new version "AlphaGo Zero" has already reached a level that can become stronger on its own



A new version of Artificial Intelligence (AI) "AlphaGo" (Alpha Go) developed by Google's Deep Mind "AlphaGo Zero"Was developed and it was announced that both the name and reality have reached the stage of" the world's strongest. " AlphaGo Zero no longer tells how to hit a game, even though they have learned how to win by doing battle-style training only by themselves and have defeated AlphaGo who won the Lee Se-dul shogi with 100 wins and 0 lossesTo be reportedHowever, its strength seems to be largely due to improvements in algorithms from previous versions.

Mastering the game of Go without human knowledge | Nature
https://www.nature.com/articles/nature24270.epdf

AlphaGo Zero: Learning from scratch | DeepMind
https://deepmind.com/blog/alphago-zero-learning-scratch/

According to what DeepMind announced, AlphaGo Zero gained skills gradually by learning how to go and he was given the ability to learn how to win. The secret of its strength is that he was given the ability to learn by himself repeatedly and that he has trained tens of millions of times using his ability.

The fifth game with Lee Se-dul (Lee Seung-kyu) Kudan who was called "the world's strongest"Strike with 4 wins and 1 loss, "I can not win against Lee Se-dul even if I can win" in ChinaKiyoshi(Kick) Kudan alsoBroke downPrevious generation of AlphaGo had learned the winning pattern by reading a lot of human's opponent's data and learning machine learning, so there were not a few human thought circuits on its base.

However, AlphaGo Zero, which started learning of go from a completely zero state without borrowing from human beings, learned go without being influenced completely by "human fossils", and furthermore, it has a stronger strength than conventional AI We have come to have.

AlphaGo Zero firstly taught only basic rules of go, and basically repeat the opponent yourself and gradually memorize the game of go. In the early stages, it was said that only random handcarts were struck, but gradually "growth" so that you can hit a neat match.


And at the stage of the third day from the start of learning, I gained more strength than "AlphaGo Lee" who won Lee Kuroda. In other words, AlphaGo Zero reached a stage stronger than a chess player called "the most powerful human being" in 3 days from the start of practice, and the results at this time were 100 games and all winning result.


AlphaGo Zero then gained more strength than the "AlphaGo Master" who defeated the online championship champion and the Ke 9. There is no longer a strong man in this world than AlphaGo Zero has reached the stage where there is no strong person. And how much time it took to reach here is only 21 days. Nonetheless, from around AlphaGo LeeIro · ratingIt is interesting that the rise of the growth curve indicated by gently becomes gentle.


And after 40 days AlphaGo Zero gained strength beyond all existing versions of AlphaGo. Up to this point AlphaGo Zero has not given any human matchup data or "fossil stones" at all, and has resulted in a shocking result for human beings that they learn how to win completely by self-learning alone.


This strength, especially among machine learningReinforceIt is acquired using a method called. This method gradually becomes adaptable to the environment as AI repeats various trial and error, and by giving "reward" of victory, AI's neural network learns "how to fight right" thing. It is said that AlphaGo Zero has become a "teacher" of each other, fight against opponents, the process of accumulating know-how by sharing the winning way learned by multiple AIs, standing on top of it and learning how to win further It is.

The advantage of this method is that it is "beyond the limits of human knowledge". There is a long history in Go, and the accumulated know-how exists in it, but since there are no limits to the answer derived by the subject of the algorithm's learning, It is possible to ultimately reach to a strong winning way.

The part which AlphaGo Zero differs from the previous generation has the following points.

- Although the features written by humans exist in the conventional version, AlphaGo Zero was given only the information on the white and black go stones put on the go board
· AlphaGo Zero used only one neural network, not two. Conventional version used "policy network" to select next hand and "value network" to judge the winner from each hand. These are integrated in AlphaGo Zero to be able to learn and evaluate more efficiently.
· AlphaGo Zero does not use a method called "rollout" or "playout" in which stones are randomly struck to determine which hand is the most influential on a board face. Instead, let the neural network with high capability be in charge of evaluating the board surface.

Due to these differences, AlphaGo Zero has become able to do much more efficient learning than the conventional AlphaGo. As a result, AlphaGo Zero seems to be able to learn with fewer processors (TPU) alone, and to drastically reduce consumed electricity.


Comparing each version with "Iro rating" which is an index showing player's ability is like this. AlphaGo Zero outperformed the score "5000" which was not exceeded by AlphaGo Master who broke the Ke 9 column.


AlphaGo Zero beyond people's area said that they are no longer even hands that people can not think about. At the stage of 3 hours from the start of learning, it was a level to put as many stones as possible as a beginner in the game ... ...


After 19 hours, think "living stone · death stone"Life and deathOr to reach the concept of "area".


After 70 hours, it reached the level which exceeds human no longer. Beholders become disciplined, so that situations where many aspects progress on one board surface are created. In addition, it is said that "manuscript" which human beings did not have is newly knitted in this.


In a certain field, we aim to make use of AI technology which has surpassed human capabilities in the mission of using DeepMind to solve human confronting problems. In the medical field, we aim to detect early detection of intractable diseases by using AI and aim to utilize AI technology for power demand adjustment. "Data mining" which finds useful insights from enormous data is a field where AI's activity is expected to be active and Mr. Demis Hasabis of DeepMind said "AI advances human intelligence and has a positive impact on all human beings There is a possibility that it can bring about. "

Go AI is the strongest in "self study" Google, industrial applications Exploring: Nihon Keizai Shimbun
https://www.nikkei.com/article/DGXMZO22407340Y7A011C1TI1000/

in Software, Posted by darkhorse_log