Internet archive that records and saves information on the web goes down for about 2 hours due to unexpected mass access


by

drosen7900

The service `` Wayback Machine '' operated by the Internet Archive , a non-profit organization based in San Francisco, California, is a service that allows you to view sites that have been deleted or made private and can no longer be viewed. On May 28, 2023, local time, the Internet Archive announced that the Wayback Machine service was down for about two hours due to a flood of tens of thousands of requests per second.

Let us serve you, but don't bring us down | Internet Archive Blogs
https://blog.archive.org/2023/05/29/let-us-serve-you-but-dont-bring-us-down/



Brewster Kale, founder of the Internet Archive, said on May 28, 2023 local time, ``What happened to the Wayback Machine today'' as ``Public domain

optical character recognition published on the Wayback Machine. Tens of thousands of requests per second for (OCR) files were sent from 64 virtual hosts on Amazon's AWS service.'

According to Mr. Kale, even with web standards , tens of thousands of requests per second are excessive access that can not be processed.

The heavy traffic brought down all Internet Archive services for about an hour. The Internet Archive was urgently convened on Sunday afternoon, which was originally a holiday, and expressed gratitude to the engineers who worked to restore it. In response to mass access, the Internet Archive blocked specific IP addresses to back up the service and recover from an outage.

A few hours later, however, another 64 IP addresses sent a similar flood of requests. As a result the wayback machine went down again and the service was temporarily stopped. At that time, I got a ' 502 error ' screen indicating that the destination server was not working properly and the request was denied. A 502 error usually appears when the server is under heavy load for a short period of time, causing temporary communication failures.



Regarding the factors that sent a large number of requests, the Internet Archive speculates that ``I believe that it is from an AI development company that is trying to collect the text of the Internet Archive at an abnormal speed and use it for learning.'' increase.




Approximately one hour after the second system failure, the Internet Archive reported that the Wayback Machine was restored.




On the other hand, Hacker News speculates , ``There is a rate limit when archiving a website on a wayback machine, but the Internet Archive must have forgotten the limit when downloading OCR files.'' It has been.

In response to this service down, Mr. Kale said, ``If you want to use a large amount of our wayback machines at once, please download slowly at a reasonable speed. If you have started such as, we can reach out if you have any inquiries.

``When using the Internet Archive and Wayback Machine, please refrain from extreme usage that will bring down the service,'' said Kale.

in Web Service, Posted by log1r_ut