Web page collection by Google is up to 15MB per page, what measures are necessary to keep the search ranking down?



In order to collect the information of the website to be displayed in the search results, Google collects the information of the myriad web pages published on the net with a crawler called '

Googlebot '. The official document about Googlebot (English version) includes the statement that 'when crawling files larger than 15MB, only the first 15MB is crawled', but the details of this 15MB limit are new. Published in.

Googlebot and the 15 MB thing | Google Search Central Blog | Google Developers
https://developers.google.com/search/blog/2022/06/googlebot-15mb

Google uses Googlebot to crawl a huge number of web pages on the Internet that are increasing every day. Google has published a document that summarizes the specifications of Googlebot for website administrators, but many inquiries about the statement that 'Googlebot crawls up to 15MB per file' added to this document It was said that it was sent. Therefore, Google has released a new explanation about the details of Googlebot's 15MB limit.

◆ What part of the web page does '15MB' indicate?
Googlebot only crawls the source of the page, not all the content loaded on the page. The limit of 15MB is intended for 'files that are read first when the URL of the target page is accessed' such as HTML files, and the total size of the content such as images and movies displayed on the page exceeds 15MB. Even in that case, if the size of the HTML file etc. does not exceed 15MB, it will not be subject to the 15MB limit.

◆ What impact does the 15MB limit have on website administrators?
According to Google, the median size of HTML files on the Internet is 30KB. For this reason, most website administrators don't have to worry about the 15MB limit. Google recommends website administrators with HTML files larger than 15MB to move the script to an external file.



◆ How are files larger than 15MB handled?
Googlebot crawls up to 15MB from the beginning of the file and does not crawl after that.

◆ Does the 15MB limit mean that Googlebot doesn't collect images or movies?
As mentioned above, Googlebot does not collect the actual files of images and movies, but collects them in HTML format such as '<img src =' https://example.com/images/image file.jpg '>'. I am.

◆ Is the data URL included in the file size?
By using a mechanism called data URL , it is possible to convert image files etc. into character strings and include them in HTML files. This data URL is subject to Googlebot crawls and is therefore included in the 15MB limit.

◆ How to check the size of a web page?
Google shows you how to use the developer tools of your web browser and the command line tool ' cURL ' to find out the size of your web page. For example, in Google Chrome, you can check the file size by launching the developer tool with 'Ctrl + Shift + I' and refreshing the web page with the 'Network' tab switched.



in Web Service, Posted by log1o_hf