Researchers point out that the claim that ``China has flooded pornographic advertisements to hide large-scale protest demonstrations'' is an error due to bias



In November 2022, 10 people died in an

apartment house fire in the Xinjiang Uighur Autonomous Region in China, prompting protest demonstrations all over the country to complain about the strict ' zero corona policy '. Meanwhile, it became a hot topic on Twitter that 'searching for the name of a Chinese city will display porn ads instead of demos,' and it has been pointed out that the surge in spam may be a campaign by the Chinese government to hide the demo. However, David Thiel of the Stanford University Internet Observatory, who investigated the matter, explains that the claim about the spam 'surge' is false due to bias.

Content Moderation Survivor Bias | FSI
https://cyber.fsi.stanford.edu/io/news/content-moderation-survivor-bias

November 29, 2022, ``There is a surge in spam ads for pornography with hashtags of Chinese cities such as 'Beijing' and 'Shanghai' on Twitter, making it difficult to find information about protest demonstrations.' reported by various media outlets. It has been pointed out that many of the accounts posting these spam advertisements are newly created or accounts that have been dormant for a long time, and it has been claimed that it is an external information manipulation campaign by the Chinese government.

China's pornographic ads exploded on Twitter, aiming to hide large-scale protest demonstrations from overseas eyes - GIGAZINE



However, Thiel said, ``I argue that much of this 'surge' in spam is an illusion, due to both data bias and cognitive bias,' claiming that media coverage is biased. . According to Thiel, spam did wash away content related to the protests, but there is no evidence that it was intentionally designed to do so, nor is there any evidence that it was a deliberate campaign by the Chinese government.

First, Mr. Thiel points out that social media analysis is greatly influenced by 'at what point in time the data was collected.' On social networks such as Twitter, users and platforms may remove content. So, even if at some point you collect data from the past month and you see a spike in problematic content over the past week, it's likely that a lot of older problematic content has been removed. Therefore, there is a possibility that it just appeared that way. In fact, not all problematic content is removed immediately after posting, and it may take some time to remove problematic content, or multiple pieces of content may be removed at once.

Thiel points out that this same kind of effect appears in different data sets in social media analytics. And the most recent data may not yet be censored, and any analysis based on 'surviving' content could distort what really happened.'

Additionally, other cognitive biases come into play in data analysis. For example, in this case, it was only recently that we realized that something had existed for a long time. may have been Due to this cognitive bias, when users who have never searched Twitter by city name in China before, when they search for the city name in the wake of the demonstration and come across a large amount of pornographic advertisements, they feel ``this is suspicious'' and a small amount It is likely that conclusions can be drawn from the data of



So Thiel collected data from the past week from a simplified Chinese search for major city names in China as of November 29, when a large number of spam ads containing Chinese city names were discovered. . Thiel actually searched for the following 39 cities.

'Beijing', 'Shanghai', 'Tianjin', 'Chongqing', 'Harbin', 'Changchun', 'Shenyang', 'Hohhot', 'Shijiazhuang', 'Wulumqi', 'Lanzhou', 'Xining', 'Xi'an', 'Yinchuan', 'Zhenzhou', 'Jinan', ' Taiyuan, Hefei, Changsha, Wuhan, Nanjing, Chengdu, Guiyang, Kunming, Nanning, Lhao, Hangzhou, Nanchang, Guangzhou, Fuzhou, Haikou, Hong Kong, Macao 'Dalian' 'Qingdao' 'Suzhou' 'Wuxi' 'Xiamen' 'Shenzhen'

The data obtained as of November 29 are as follows. The number of tweets posted between 21:00 on November 21st and 5:00 on November 29th was 3,713,674, and on the 27th and 28th when large-scale demonstrations occurred, tweets containing city names increased rapidly. Most of them were spam. However, you can also see that a certain number of tweets existed before the apartment fire on the 24th and the accompanying demonstrations.



Also, looking at the number of tweets containing 'Suzhou' during the same period, it turned out that the peak existed not only on the 28th but also on the 22nd.



The trend in the number of tweets containing 'Lanzhou' increased from the 26th to the 28th, but it decreased sharply on the 29th, indicating that the trend in the number of tweets differs depending on the city name.



Additionally, Thiel used traits in the tweets to identify clusters of similar spam campaigns. This is a graph of the frequency of tweets by the 10 most active clusters. The number of tweets from the 10 clusters is 3,326,311, which is a considerable number, and although there are clusters where the number of tweets peaked after the 24th, some clusters were before the 24th. It's active and doesn't necessarily show a 'spike in spam to cover up tweets about demos.' In addition, since the accounts used by these clusters are frequently stopped, it seems natural that there are many accounts created in the past month.



Thiel hypothesized that these spams were commercial in nature and unrelated to the fires and protests, and conducted a more detailed search. First, as of November 30th, we searched for tweets from November 15th to 30th. Looking at the graph below showing the results, spam containing city names was common even before the fire in the Xinjiang Uyghur Autonomous Region, and peaks comparable to the 27th and 28th occurred repeatedly before that. I understand. Also, despite the demonstrations continuing after the 28th, the number of spam has dropped sharply.



As of December 4th, when I searched for tweets over the past week, it turned out that there were more spam containing city names after December, when the demonstrations had subsided considerably. Regarding this result, Mr. Thiel pointed out that it may be due to increased activity and failure of Twitter's anti-spam system, at least in the scenario that 'the Chinese government supported an organized campaign to hide the demonstration' He says it doesn't fit.



Furthermore, after a period of time, on December 8th, we again searched for tweets from November 21st to 29th. Then, it was found that the total number of tweets was ``3,375,069'', which was more than 300,000 fewer than the ``3,713,674'' confirmed at the time of the search on November 29. Looking at the graph, it can be seen that the peak on the 28th is particularly small.



In the first place, Mr. Thiel said that if the Chinese government wants to hide the protests from the eyes of the outside world, it will be related to the protests 'white paper revolution', 'white paper action', 'white paper protest', 'A4 paper revolution', '#A4Revolution', ' It would have been more effective to spam words such as #WhitePaperRevolution, #ChinaProtest2022, and #ChinaUprising, but there was no such evidence.

A series of analyzes did not show a pattern of a systematic campaign by the Chinese government to cover up the protests and a surge in spam ads containing city names. “Social media data is complex and often bizarre,” Thiel said, adding that while researchers and the media are prone to jumping to sensational conclusions, analyzing recent events takes bias into account. I argued that it should be included.

in Web Service, Posted by log1h_ik