Internet archive efforts and problems fighting fake news


By aqrstudio

Brew, the founder of the Internet Archives, discusses the efforts to combat fake news in archiving past information and “problems against rapidly increasing Internet content” that are performed at Internet Archives that archive various digital information. Says Star Kale .

How the Internet Archive is waging war on misinformation | Financial Times
https://www.ft.com/content/5be1f2ee-d60b-11e9-a0bd-ab8ec6435630

The Internet Archive, established in 1996, uses a free webpage repository called the Wayback Machine that allows users to check the contents of URLs as they are archived, regardless of changes or deletions of specific URLs. It is a non-profit organization that collects information such as web pages, software, movie / recording data, etc. and provides it free of charge.

Since the 2016 US election, concerns about fake news have increased, and the Internet archive has strengthened measures against fake news. In an era where social media is constantly being updated, such as the spread of fake multi-partisan content, it is important to keep who and what is said as immutable data.

The following image is a screenshot of FT.com on 15 September 2008, filed for bankruptcy by Lehman Brothers , the beginning of the Lehman shock , taken from an Internet archive by wayback machines.


Kale currently employs more than 100 staff at the Internet Archive. It costs about $ 18 million per year and is made up of donations, grants, and funding from third parties that request specific digitization services.

To date, the Internet Archive has archived 33 billion web pages, 20 million books and texts, 8.5 million audio and video records, 3 million images, and 200,000 software programs. Some information can be accessed free of charge, and can be rented if copyright law applies, and some content is available only to researchers.

Mr. Kale lamented how difficult it was for ordinary people to obtain reliable information due to the influence of fake news, `` I want to raise a generation that can handle PC without information accessible via PC, '' Mr. Kale Says.

After the election of Donald Trump as president, the Internet Archive launched a new project because of the presence of false information that would upset voters. One of them was the “Trump Archive”. The Trump Archive is a collection of over 6,000 presidential TV appearances, including before President Trump took office. In addition, we collect Mr. Trump's posts on Twitter as part of documenting the often contradictory statements of the president.

The following is a tweet by Trump captured on a Wayback machine.


Social media is “a very important communication platform,” says Mark Graham, director of wayback machines, and news feeds and chat apps like Facebook get the information “ It ’s a “good way”.

Internet archives want to be able to identify false information and help confirm the facts of suspicious content. Video libraries can help you discover videos that have been tampered with by experts or algorithms, or out of context. However, it is difficult to decide how to deal with fake news, and the Internet archive has no obligation to deal with it. For certain researchers and politicians, Graham argues that “simply deleting incorrect information or offensive content is not necessarily the correct answer” because sometimes it can be investigated from incorrect information.

Given the explosive growth of the Internet over the past 20 years, archiving the Internet has become increasingly difficult. Kale hopes that the Internet archive will help, at least by archiving popular websites, but Graham says Kale is “optimistic” and the archive is expected I said I could n’t save the information. Taking YouTube as an example, only a small portion of the videos released every week are archived.

The Internet archive uses about 3000 different algorithms called “crawlers” and takes regular snapshots of web pages stored on the wayback machine. The snapshots taken cover a wide range, including local political websites.

The archive is stored on a server located in a place that was originally a nave of the church. In addition, full backup copies have been made elsewhere, with partial copies in Alexandria in Canada, the Netherlands and Egypt as a precaution against data loss.

Kale says the San Francisco-based Internet archive has nothing in common with Silicon Valley companies in San Francisco. While Silicon Valley has a small management team operating a platform used by billions of people, the gap between rich and poor is a problem. Rather than human beings profiting, many people want to profit from the “heritage of all technologies through the Internet archive”.

in Web Service, Posted by log1m_mn