AI and all other technologies and fully automatic news compilation & creation & delivery "Toutiao" The secret of rapid growth



A news aggregation application from China that generates and provides contents using machine learning "Toutiao"Has achieved rapid growth in recent years. The venture capital Y combinator is analyzing how such Toutiao has grown.

The Hidden Forces Behind Toutiao: China's Content King
http://blog.ycombinator.com/the-hidden-forces-behind-toutiao-chinas-content-king/

Developed by China's leading IT company ByteDance, Toutiao is a news feed, YouTube,TechmemeIt is such an application that is integrated into one and is an extremely popular application that more than 120 million people use every day in China. The interesting thing about Toutiao is not the point of "checking movies, news, etc. in a single application", but on how to provide information. Toutiao creates high-quality content feeds using users' machine learning algorithms and provides them to users, without relying on user input of information, social graphs, product purchase history, etc.

Toutiao's algorithm not only provides content, it is also possible to create content. At the Rio Olympics in 2016, Toutiao's bot created news coverage and quickly reported news over other media. In addition, articles written by Bot are being read by more users than content written by human beings with higher cost and more time.

Users average 74 minutes per day using the Toutiao application, which is longer than Facebook users average, more than twice the average of Snapchat users. In addition, more than half of the use time is spent watching short movies, which makes Toutiao generate more than 10 billion video playback a day, Toutiao is used like Chinese version YouTube I am doing it.


Toutiao is such an app that appeared in 2012, providing users with content that shows interest by using machine learning and deep learning. Toutiao tells you how the user taps and swipes which content, how long you stayed in each content, what content you saw at what time, what day and what time you left comments, We are improving the content feed from information on whether you are using an application on. As a result, the number of active users per day exceeded 120 million people, but as a result of what kind of measures, as a result of venture capitalY CombinatorI am analyzing it.

◆ 1: Understand the gap and seize opportunities
Toutiao app appeared at the timing when the use of smartphone was lifted in China. From 2010 to 2014, mobile penetration rate remained almost 65%, without growth. In addition, because most of the major content providers were not developing mobile apps and mobile-friendly sites, truly mobile-friendly content was rarely present. So, until mid-2012, there are only 6 news applications for Android in China, four of which are improvements to the existing news portal with limited optimization for mobile , The remaining two are aggregators that depend on the input of a non-personal editor at low speed to decide which contents to display. In addition, the Chinese user's request for content has been considered insufficient for social networks for China such as WeChat and Weibo.

Toutiao that was born in such a circumstance successfully got into the gap of the Chinese market by being easy to use, personalized, informative and addictive mobile application. Toutiao is easy to start using at the beginning of the release, there is no need to create an account, set a password, link with SNS or enter any information of your choice, simple design, intuitive operation without a tutorial with intuitive design It is supposed to be able to do. Although it is known that it is difficult for all the applications to be installed as a daily active user (DAU), Toutiao is known to be "DaU" by its "low hurdle at start" I will increase it.

"Toutiao" of the application name means "heading for today" in Chinese. Icon of the application also attracts users' attention, and it helps increase the number of users. Also, being able to check various news at once in one place was epoch-making as then. In addition, Toutiao has been analyzing user behavior from the beginning, and it has turned into a personalized news aggregator for each user in one month after release. As a result, we succeeded in extending the DAU to one million people in the release 4 months. After that, functions and algorithms are updated weekly and improved.

The number of mobile applications that can be used in China has increased more than three times over the three years from 2012 to 2015, but Toutiao built the foundation of stone stone by making use of the start dash and competing applications It seemed to have been.


◆ 2: Data network effect built intentionally in the whole system
You can get all the algorithms around the world, but if you do not make an addictive product you can not improve the product because you can not get the data. Simply put, the more users use the product, the more data you get, which will result in better product improvement. And if that improvement goes well, it will be helpful to the user and will lead to more data.

Toutiao is finished in an addictive app and has succeeded in collecting various data from users. Then, the collected data is input to Toutiao's algorithm, which will improve the quality of the application. In other words, Toutiao has succeeded in optimizing the "content life cycle" consisting of four stages of "creation", "collection", "recommendation" and "interaction" at all stages.


· Creation
Content creation has long been a specialty area of ​​human beings. However, Toutiao is about to make a big change, and its main axis is "Xiaomingbot". Xiaomingbot debuts at the Rio Olympic Games in 2016 and has succeeded in posting news earlier than traditional media. That speed is a masterpiece, and the news is released in just 2 seconds after how the event ends.

There are various obstacles to achieve this. First of all, we need data to publish articles on Olympic match results. So, Toutiao made it possible to update real time scores from Olympic Games management, acquired image media to find related visuals, and also monitored live text analysis on matches. Initially it began with table tennis, tennis, badminton and women's football, and it seems that these events were chosen because they could easily summarize information in a rule.

Next, we had to think about how we could combine the data from the three sources to create a "consistent story". This is a much more challenging task than analyzing data. For example, even if you select an image to use in the news, it seems that there is a necessity to integrate natural language processing capability and context image recognition as the image must be suitable for the content of the game. In the actual system, it seems that it became possible to select the image most related to the story by analyzing the content of the candidate image using the convolution neural network and learning from the history data. Also, using a deep learning algorithm between sequences, summarize existing stories and propose better article titles.

And at the time of the Rio Olympic Games, Xiaomingbot has created 450 articles, now it is now possible to create articles of genres other than sports, and it is now possible to generate more than 8000 articles per day.


· Collection (curation)
Early Toutiao's main engagement driver was "soft news" such as celebrity gossip, pop culture, lifestyle articles. Contrary to the news by state-owned news organizations that are well known, software news is being distributed from many individual sites, in short, it was not a central place to access content. Therefore, to check soft news, we had to visit various sites, but there was no proof that we are still seeing the most useful information. However, due to the appearance of Toutiao, "If you want to check soft news, you can use Toutiao OK".

In addition to providing content to users, the content curation service must choose which content to provide. First you need to visit the website, identify the content, and collect relevant metadata. In addition, you will need to continually update the main repository of the story and create as many personalized versions as possible. These tasks are process intensive tasks, both of which are far superior to humans in terms of algorithms. However, when Toutiao appeared, these kind of tasks were done by the portal site on the web, which was done by the human editor as well.

In addition, Toutiao uses algorithms to identify and filter low quality content. With this, Toutiao can now offer only interesting things to users. Also, it seems that fake news, spam, etc. are identified using a text classification algorithm.

· Recommendation
Curation of contents is the most well-known function among Toutiao, which accounts for the majority of its success and reputation. "Deep learning" is applied at the "collection" stage in the content lifecycle, which is a decisive difference between Toutiao and competitive services.

The problem that the Recommendation Engine is trying to solve is to constantly screen "100 articles with high interest from the user when the platform recommends to each user". The problem is simple, but the solution is difficult. Toutiao emphasizes three main themes: "user profile" showing attributes such as age, sex and socio-economic status of users, "content of articles", "context" such as location related data, We curate content that users are interested in from a complex viewpoint.

When the user launches the application of Toutiao, the system refers to the basic data in the profile. For example, in the case of a person working in Silicon Valley, it can be inferred that the likelihood of clicking on a technology related article will increase, but the system displays various kinds of articles in order to correctly evaluate interest and indifference. Then, we will see what kind of interest you will be interested in "content you do not know whether users are interested" which can not be guessed from the user's profile. And as the time to use the application becomes longer, the selection of what to recommend becomes more sophisticated, and better curation becomes possible.

·Interaction
As Toutiao grows, user interaction on the platform will play a more important role. Again Toutiao does not use human resources, it presents a solution using an algorithm. Toutiao's AI team has developed a matching engine that can connect questioners and those who can answer. Regarding this matching engine, it is called "a large-scale knowledge-based conditional centralized question answering approach"paperThe results have been announced as successful, and we succeeded in returning the correct answer with accuracy of 75.7% for 108,000 questions.

◆ 3: From content aggregation to content destination
It is not uncommon for applications to take on the role of distributing from the role of aggregating contents. In order to realize this, Toutiao did 2 points.

What Toutiao did is to "provide incentives according to revenue sharing" to content contributors and "manage Toutiao accounts of over 80,000". The incentive program has started since 2014, and when we achieve the target value by numerical values ​​such as the number of articles and the reading rate, we guarantee the minimum amount of funds per month as well. Among over 80,000 Toutiao accounts, not only news media but also bloggers and influential celebrities are included, which makes it possible to handle articles of various genres. Although there are categories of various genres on Toutiao, even the top 20 categories account for 60% of the whole contents, and there is no one that occupies more than 10% of the whole in one category As much as possible, content is scattered in various categories.


◆ 4: Not restricted by format
Toutiao also has flexibility that is not restricted by the format, and we respond quickly by expanding the entire platform to what we judge that data should be done. For example, when many Chinese movie distribution services tried to cope with long-time movie playback in 2015, Toutiao corresponded to the movie function and began supporting short movie contents of about 1 to 5 minutes. This is because we observed that the improvement of infrastructure in China in 2014 brought popularity of movie content. In addition, Toutiao has also developed an incentive program to promote movie content.

◆ 5: Collaboration between early monetization and products
Toutiao is getting unprecedented revenue with an unprecedented speed of five years from the start of business and three years from monetization. Toutiao is one of the fastest growing applications so far, and the sales target for 2017 is set at more than $ 2.2 billion (about 250 billion yen).


Toutiao has excellent content curation technology, which indicates that "you can properly read what content the user wants." With this strength, Toutiao is generating revenue by displaying relevant advertisements. Normally, advertisements are detrimental to users' convenience, but by displaying optimal advertisements, we have succeeded in minimizing the impact on user experience.

According to the survey company Toutiao'sCTRIt is 200% better than other companies in the same industry.

in Mobile,   Software,   Web Service, Posted by logu_ii