What are the tips for creating 'popular articles' found by changing headlines in A / B testing? For the New York Times



A / B testing, which separates multiple sentences and images to see which performance is high, has been adopted as a method to dramatically improve the profitability of websites. Stripe engineer Tom Cleveland independently researched how the New York Times, one of the leading media outlets for A / B testing, is conducting A / B testing and how effective it is. I am.

How the New York Times A / B tests their headlines --TJCX
https://blog.tjcx.me/p/new-york-times-ab-testing

The New York Times is open to conducting A / B testing for headlines, but hasn't revealed how A / B testing works. So Cleveland decided to use the official New York Times API to investigate the following four points:

◆ 1: How many articles and how many headlines are A / B tested?
◆ 2: What are the differences in the headings tested?
◆ 3: Is A / B testing useful?

To carry out this research, Cleveland wrote a script to scrape the New York Times, extracted headlines from each article , associated them with the official API metadata, and populated them into the database. The script is executed every 5 minutes, and the investigation period is 3 weeks from February 13, 2021.

◆ 1: How many articles and how many headlines are A / B tested?
The survey first showed that the New York Times A / B tested multiple headlines in 29% of all articles. Cleveland said the most common headline we looked at was eight. Looking at the graph below, articles with one headline are the most, and the percentage decreases as the number of headlines increases to two or three.



◆ 2: What are the differences in the headings tested?
Some headings differed significantly, while others had little difference, Cleveland said. Some, for example, 'Don't Give In to Terror' and 'Don't Give in to Terror', just write 'in' and 'In' in uppercase or lowercase.

On the other hand, there are also articles in which the impression of the story changes as the headline changes.

The March 3, 2021 article reporting the SpaceX rocket launch test was tested under the following seven headlines:

1. 1. SpaceX to Test Launch Another Prototype of Rocket to Mars
2. SpaceX Halts Test Launch of Prototype for Rocket to Mars
3. 3. SpaceX to Retry Test Launch of Prototype for Rocket to Mars
Four. SpaceX Launches, Lands and Explodes Prototype of Its Rocket to Mars
Five. SpaceX Mars Rocket Prototype Explodes, but This Time It Landed First (SpaceX Mars rocket prototype exploded, but this time it landed successfully)
6. SpaceX Mars Rocket Prototype Explodes, but This Time It Lands First (SpaceX Mars rocket prototype exploded, but this time it landed successfully)
7. 7. SpaceX Mars Rocket Prototype Explodes, but It Lands First (SpaceX Mars rocket prototype exploded, but landed successfully)

The timeline where the A / B test was done looks like the following. The article was published at 10 am and finally settled in the afternoon with the title of the 7th proposal. You can also see that the test time is extremely short only for the second title.



The above heading changes are a series of small changes, but the headings can change at once.

The following headline in President Biden's article published on March 4 is an example.

1. 1. Speak Softly, and Carry a Big Agenda
2. Biden Is the Anti-Trump, and It's Working

The change in the above headline increased the engagement of the article, and within a few hours of the change, the article was included in the list of 'most read articles'.



However, it seems that not all headline changes increase article engagement. The following heading was changed from 1 to 2 and then immediately reverted to the previous heading.

1. 1. Have You Seen How Many Israelis Just Visited the UAE? (Do you know how many Israelis have visited the United Arab Emirates?)
2. Jumping Jehoshaphat! Have You Seen How Many Israelis Just Visited the UAE?

Below is a graph showing the time each heading was displayed. The blue dot between 0 and 6 am is the period when the second heading is displayed.



As a whole, headlines tend to be sensational over time, and the article about the alleged sexual harassment of New York Governor Kuomo said that tendency was remarkable.

1. 1. Cuomo Attacked Over His Plan for Review of Sex Harassment Claims
2. Under Siege, Cuomo Revises Plan to Review Sex Harassment Claims
3. 3. Under Siege Over Sex Harassment Claims, Cuomo Offers Apology

The first headline change made Governor Cuomo 'siege' and the second change made him 'siege and apologize', making the headline more emotional. This change also increased article engagement and made it on the 'Most Read Articles' list.

Cleveland also said in an interview with Princess Megan , the spouse of Prince Henry of the British royal family, that 'more emotional headlines' were chosen. The following is the transition of the interview article headline of Princess Megan.

1. 1. Saying her life was less a fairy tale, Meghan Markle described the cruel loss of her freedom and identity.
2. Saying her life was less a fairy tale, Meghan described the cruel loss of her freedom and identity.
3. 3. Meghan Says Life With the UK Royals Almost Drove Her to Suicide.
4.'I Just Didn't Want to Be Alive Anymore': Meghan Says Life as Royal Made Her Suicidal ('I didn't want to survive any longer', Megan that royal life created suicidal ideation Talks)

After reading all the two-hour interviews, Cleveland felt that 1 and 2 were a good summary of the content. The story of Meghan thinking about suicide was told in the first five minutes of the interview, and the actual content was so diverse.

◆ 3: Is A / B testing useful?
Cleveland calculates that A / B-tested headlines are 80% more likely to be on the 'Most Popular Articles' list than unA / B-tested headlines. It was said that it was done. It was also shown that an increase in headline A / B testing is associated with an increase in engagement. On the other hand, it is unclear whether the engagement increases as the headline A / B test is done, or whether the New York Times is increasing the number of A / B tests in highly engaged articles.



Cleveland also points out that fewer articles are A / B tested for headlines compared to the total amount of articles. This is believed to be due to the fact that 62% of New York Times revenue comes from subscriptions and advertising revenue is relatively low at 27%. Page views aren't as important as subscriptions, and too radical language can scare potential subscribers, Cleveland believes.

in Note, Posted by darkhorse_log