A quarter of all online content that existed between 2013 and 2023 will have disappeared



A survey by

the Pew Research Center , which studies a wide range of issues from politics and religion to the Internet, science and data science, found that a quarter of the content that existed on the Internet between 2013 and 2023 has already disappeared and is no longer accessible.

Link Rot and Digital Decay on Government, News and Other Webpages | Pew Research Center
https://www.pewresearch.org/data-labs/2024/05/17/when-online-content-disappears/



First, Pew Research Center randomly extracted 999,899 URLs from content that existed between 2013 and 2023 using crawling data collected by the non-profit organization

Common Crawl , and investigated whether the content still remained.

Overall, they found that a quarter of content was inaccessible.

The graph below shows the content that was no longer accessible by year. 38% of the content from 2013, the oldest year in the survey, was inaccessible, and 8% of the content from 2023, just one year ago, was already inaccessible.



In the case of news websites, 23% of pages contained at least one broken link, and in the case of government websites, 21% of pages contained at least one broken link. There was no correlation between the presence or absence of broken links on news sites and their size, but in the case of government websites, broken links were more prevalent in regional areas.

In addition, a survey of 50,000 English Wikipedia articles found that 82% of articles contained links to websites in the 'references' section, but 53% contained at least one broken link.

In addition, Pew Research Center conducted a three-month follow-up survey by collecting real-time posts on X (formerly Twitter) over a three-month period starting in spring 2023.

The researchers found that 18% of posts had disappeared after a few months, 60% of which were deleted because the accounts that posted them were made private, frozen, or deleted, and the remaining 40% were simply deleted.

As a general rule, posts in Turkish and Arabic tend to be deleted more easily, with over 40% of posts being deleted within three months.

We also found that posts from accounts that have their profile settings left in their default state are more likely to become inaccessible.

However, 6% of the posts that disappeared were later made accessible again by the account being restored or their status being changed from private to public, etc. Of the posts that reappeared, 90% remained available until the end of the study period.

in Note, Posted by logc_nt