Automatic real-time filtering provided by over 1 billion message apps monthly active users


by

JESHOOTS-com

Studies have shown that China's popular messaging app, WeChat , performs automatic filtering in real time on images that are sent and received during message exchange.

(Can't) Picture This 2: An Analysis of WeChat's Realtime Image Filtering in Chats-The Citizen Lab
https://citizenlab.ca/20019/07/cant-picture-this-2-an-analysis-of-wechats-realtime-image-filtering-in-chats/



It is a famous story that China is carrying out severe Internet censorship . Under strict censorship, such as obliging real-name registration to write on the online bulletin board and blocking access to Wikipedia, companies providing services in China can provide services that can be censored to the level required by the Chinese government. The Tencent, one of the largest Internet companies in China, is no exception.

Meanwhile, Citizen Lab , a multi-disciplinary research institute at the University of Toronto, is using Tencent's most popular message application 'WeChat' in China, with over 1 billion active users per month, to automatically generate images for user interaction. Show that we are doing real-time filtering.


by

Sinchen.Lin

According to Citizen Lab's findings, WeChat has implemented an automatic, real-time filtering feature that allows for 'text embedded in the image' and 'visual similarity of the image' to be black. It is possible to check if it is similar to what is registered in the list and block those that are relevant.

The image below, from the user A (left) (right) user B in WeChat is a satirical cartoonist of China RebelPepper that the Royal inscription Mr. drew ' 709 incident screen shot of when you send a caricature of'. Images sent by user A are blocked for real-time filtering and have not arrived to user B's source.



WeChat's image filtering feature uses two methods: using

optical character recognition (OCR) to recognize text contained in the image, and comparing the visual similarity of the image.

When using OCR to recognize the text contained in the image and decide whether or not to block the image, a message of “patriotic patriotism (love the party and love the country)” and It is likely to recognize and block a critical message from an image containing a message that 'the ravages of the Communist Party (Den) destroys the Communist Party' that criticizes the Chinese government.



In addition, the images below are also blocked by the real-time filtering feature because they are visually similar to those blacklisted by WeChat.



In addition, WeChat also makes it easy to maintain the real-time filtering function by indexing “

MD5 ”, which is a type of image hash function sent by users. Citizen Lab points out that it is indexing relatively easy to compute hashes because computing costs are too expensive to index images and it is difficult to process in real time.

If the MD5 of the image sent by the user on the chat is not present in the index, the image will be sent unfiltered, but will be queued for automatic analysis. Therefore, if it is determined that the image is highly confidential, the MD5 of that image is added to the index, and it can be blocked if the same image is sent next time.

We also find that WeChat uses different filtering indexes for 'group chat', 'one-on-one chat' and 'moment'. This indicates that images sent in group chats and censored may not be filtered in one-on-one chats or moments. However, images that are visually similar to the images registered in each index are set to be subject to filtering, so each index is designed to be applicable to filtering of any image. .

The following images illustrate the number of 111 images that would be subject to censorship that were blocked for each of 'group chat', 'one-to-one chat' and 'moment'. The number of images blocked only by moment is 2 (red), 2 blocked only by group chat is 2 (green), 71 images blocked by moment and group chat (orange), moment · group chat -36 images (purple) are blocked for all 1-to-1 chats, and it is clear that 1-on-1 chat filtering is the slowest.



Even if it is the same image, there seems to be a case that WeChat client re-encodes the image depending on the format of the image file etc. Although the hash value of the image differs depending on the file format and 'Whether the WeChat client has re-encoded' etc., if it is judged that it is visually similar to the image registered in the blacklist stored in the index, filtering is performed respectively It becomes a target of The hash value will be different if the content is the same but the resolution is different, but if the WeChat client re-encodes, it will be the same hash value if the image file has the same content even if the resolution is different.



In addition, WeChat's automatically generated index of images to be filtered is blocked mainly for content critical to the Chinese government. Categorizing the 220 filtered images results in the following, which shows that government related content is overwhelmingly blocked.



It is unclear what kind of judgment WeChat uses to select images to block, but we have performed neutral keyword censorship referring to official policies and ideologies related to highly sensitive events in our

research so far It is clear that you are doing. As a result, it is also clear that images that are critical to government leaders and party leaders are not necessarily blocked.

The events that WeChat specifically targeted for blocking are: The number in parentheses is the number of related images blocked.

Cultural revolution (4)
-Six four Tiananmen case (1)
Fan Bing Bin evasion scandal (2)
Bus fall accident that occurred in Chongqing in 2018 (2)
2018 US Mid-term Election (3)
China Supreme Court's Trial Record Lost Scandal (24)
Huawei CFO was arrested (10)
China - US Trade War (8)
Scandal about twin babies born by Chinese genome editing (2)
Problems that rotten food was provided at an elementary school in Sichuan Province (2)
Sichuan wildfires that occurred in 2019 (2)

Citizen Lab states that 'Censorship conducted on social media in China is often responsive to the news cycle, as Chinese companies tend to strictly control information on scandals and other issues.'

In addition, it seems that many images, such as images promoting non-commercial products, non-political memes , content including nudity, and content including rebellious content against the Chinese government were also blocked. For example, in China, firearms are not allowed to be owned or sold, so flyers that promote the sale of firearms are blocked.



Other types of images blocked include those related to terrorism and religious extremism.

However, there are also images that are unclear why they are registered as indexes for filtering. For example, a picture taken by a famous primatologist Jane Goodall with a chimpanzee baby seems to be blocked in WeChat.



Also, Citizen Lab found that “WeChat's database contains images sent by users, including both general users and researchers, through the platform. As researchers, this is filtered out. Measuring on images has led to the challenge of potentially changing future measurements, even with automated filtering systems, where past measurements may change the future behavior of filtering on the platform Please note that there is '.

in Mobile,   Software, Posted by logu_ii