What is the pitfall "hill climbing" that is prone to A / B tests?


ByP & amp; K's Mommy

One of the evaluation methods enabled by the InternetA / B testthere is. Although it is an effective method to quantitatively capture the reaction of the user by switching and displaying two patterns of pages with different contents, if you continue to stick to the A / B test while keeping the accuracy of the result high, Too much commitment to numerical rise "Hill climbingThere is a trap that it falls into a situation called "

If you are caught up on hill climbing, too sticking to short-term successLooking at the trees without looking at the forestChris Said, who is involved in data research on Twitter, points to four points to be aware of in order to avoid such a failure.

Four pitfalls of hill climbing · The File Drawer
http://chris-said.io/2016/02/28/four-pitfalls-of-hill-climbing/

"Local maximum value" (Local maxima)
In the A / B test in which success and failure appears clearly, the heart is excited about the success that appears in the short term, and sometimes I miss the "big success" which could have been originally obtained. That is the situation, for example: Originally it is a case that you should have climbed to the big mountain on the right (= succeeded) but you mistaken it as "successful" because you climbed the small mountain on the left.


Since there are various levels of "success" of goods, it is important to judge success with a big eye regardless of short-term success.

◆ "Maximum value that will appear later" (Emergent maxima)
It is an A / B test in which the user's reaction appears immediately, but this does not necessarily mean "success and failure can be quickly judged." As factors determining the success or failure of the service, as the number of users increases, it will accelerate successNetwork externalitiesThere are times when it takes a certain amount of time before success is visible. For example, as shown below, the one that felt as "failed" because it showed a movement that rolled down the slope at the beginning at first ... ...


Sometimes it turns into "success" where the mountain grows sharply over time. This may be the case where "positive feedback" occurs due to an increase in the number of users, or users who are initially confused by the changes made gradually become accustomed and become aware of their usefulness


In this situation, when "Ribbon" displayed at the top of the screen of Microsoft Office-related software first appeared, many people felt that it was "disturbing" or "this is not used", but now it is almost accustomed The situation may be mentioned as an example.

◆ "Novelty effects"
The above two points were the points to see the success correctly, the latter two are the viewpoint to correctly judge the failure. Depending on goods and services, although it shows a smooth start at the beginning as follows ...


As time goes on, there are things that become popular with Stone. This is a case where you are losing interest if you try to use it for a while, but only a negative impression is left, even though the changes you made were new and eye-catching. Again, if you make judgments based on only short-term changes, you will inevitably evaluate what you should judge as "failed" as "success".


◆ "Loss of differentiation"
Although there are points that are in common with the "novelty effect", even if you do not have sufficient identity with other competitors, you may end up in short term success. The following graph shows the changes of our company (left) and the competitor (right), but at first it shows a movement that our company's services will gradually succeed ......


Eventually it gets popular with Stone, and furthermore the competitor's graph gets lifted a little. Although the service introduced by the company was novel, it is the same as the competitor's service after allcopyThis is a possible case. This is also a good example of mistaking the true success with just a short-term "hill climb".


While paying attention to the fact that the above graph is slightly exaggerated, Said carefully on these points and "to analyze the case which is more successful than failure and continue the long-term test "Because success is a pluralistic, even if it succeeds from a certain viewpoint, it is important to evaluate and test more extensively." In addition, since source graph Said's blog shows animated graphs, the image of 4 points is easy to grasp.

in Note, Posted by darkhorse_log