The strategy taken by AI "Libratus" who defeated the human in incomplete information game poker is released in the thesis
Poker is said to be an "incomplete information game" because all information on hands has not been released, and it has been considered difficult for artificial intelligence (AI) to defeat humans. However, AI developed by Carnegie Mellon University in January 2017Libratus"Has successfully cleared this challenge and succeeded in winning poker at the opponent of a human player. A researcher who developed the Libratus has released a paper on what strategy Libratus has taken.
Superhuman AI for heads-up no-limit poker: Libratus beats top professionals | Science
Inner workings of victorious AI revealed by researchers: Libratus AI defeated top pros in 20 days of poker play - ScienceDaily
CMU team publishes paper on how their poker-playing AI beat the best humans | TribLIVE
Unlike Go and Shogi, "Poker" in Card Game is "incomplete information game" which is compelled to fight in incomplete reply where opponent's hand is not released, so poker is generally the best It is known as a game where it is difficult to develop algorithms to locate hands. For that reason, the fact that poker has functioned as a measure of the evolution of AI has been functioning, and the accomplishment that AI "Libratus" developed at Carnegie Mellon University accomplished a virtue of breaking human professional players is that AI development It is praised as the historical feat of.
Artificial intelligence and poker confrontation with four pros a complete victory of artificial intelligence - GIGAZINE
Dr. Thomas Sandolm of Carnegie-Mellon University published a paper on science journal Science on Libratus's tactics on December 15, 2017, about 11 months after the historic feat, how he overcame humans in poker Did.
According to the paper, the strategy taken by Libratus was roughly divided into three approaches. The first approach, the first approach, is to do the work called "abstraction of the game".Texas HoldemIn poker, a huge number of branch points, "10 161 powers" appears. This is a tremendous number that it is more than the number of substances in the whole universe and calculating all these is impossible even with the latest computers. So, for the easier calculation, Libratus's first module seems to work first abstracting games. For example, although there are several kinds of "flash" hands such as King High Flash and Queen High Flash, grouping by considering them all as the same one means to reduce the number of hands to think. Likewise, since there is no big difference between the case where the wager is 100 dollars and the case of 101 dollars, these are also grouped and simplified. The abstraction done in this first phase is named "blueprint (blueprint) strategy" to form a coarse strategy for the later round.
When the game enters the second half, Libratus' s second module begins to make finer strategies from the information on how you played in a small battle (sub game) until then. By the time it approaches the final stage, Libratus says it will refine the second-stage strategy based on how the game has developed. In pokerBluffThe other module is used to shake the opponent, but each time the opponent human player makes a movement that is not abstracted, the second module treats it as a sub game and calculates it, incorporates another model into the assembly strategy It is said to do.
The final third module is a process of "self-improvement" to further enhance the blueprint strategy. It is said that it fills up the "branches" lacking in abstracting in the blueprint strategy here. However, since this amount is too huge for calculating all this work, it seems to simplify the calculation by making use of the actual behavior of human beings. "AI finds mistakes in other strategies using machine learning and uses it," says Dr. Sandrum explaining the work in the third module. In order to detect potential holes in this blueprint strategy, he seems to be analyzing the stake of the opponent.
Although the content described in the paper is too difficult for an amateur to understand, the results obtained by Libratus's strategy of going through three stages are as shown in the graph below. Libratus (red) does not lose chips from beginning to end, after the midfield, it continues to take chips one-sidedly so that the graph draws a straight line and defeat the four professional players by the full flaw.