How to create a system to detect past links being hijacked by dangerous sites



The lifespan of a website is unexpectedly short, and readers often contact us that the linked site inserted in the article is a completely different site or the link is broken. It is good if the link destination is a harmless site, but it is very dangerous if it has become a phishing site or a malware site, so Google's API ' Web Risk ' that judges dangerous sites from the URL We have created a mechanism to check the links of the entire site on a regular basis using.

Web Risk | Google Cloud

https://cloud.google.com/web-risk

At the time of writing the article, Web Risk has two APIs, 'Lookup API ' and ' Update API '. The Lookup API is an easy-to-understand API that directly determines whether it is safe or dangerous by pouring the URL, while the Update API is the first hash of the URL you want to check after downloading the database filled with the first few bytes of the hash. When they match, the complete hash is obtained again and the match is judged. If possible, I would like to hit the Lookup API from one end to solve the problem.

If you check the price list , the Lookup API will cost $ 0.5 (about 55 yen) per 1000 times for more than 100,000 times. If you check the article of GIGAZINE, there are about 1 million links in total, and if you process all with Lookup API, the cost of one survey is about 50,000 yen. On the other hand, the Update API is free to download the first hash database, and then the fee system is $ 50 (about 5500 yen) for every 1000 complete hash acquisitions. To consider the price, you first need to find out how often the beginning of the hash matches.



To use the API, create a service account from the GCP console and download your credentials.



Set the downloaded credentials in the '

GOOGLE_APPLICATION_CREDENTIALS ' environment variable and you're ready to go.



According to the sample described in the Update API guide , the hash database can be obtained in the following format.
[code] {
'recommendedNextDiff': '2020-01-08T19: 41: 45.436722194Z',
'responseType': 'RESET',
'additions': {
'rawHashes': [
{
'prefixSize': 4,
'rawHashes': 'AArQMQAMoUgAPn8lAE ...'
}
]
},
'newVersionToken': 'ChAIARAGGAEiAzAwMSiAEDABEPDyBhoCGAlTcIVL',
'checksum': {
'sha256': 'wy6jh0 + MAg / V / + VdErFhZIpOW + L8ulrVwhlV61XkROI ='
}
} [/ code]



Since all the first hashes are combined and passed in base64 format, use them separately for each number of bytes specified in 'prefixSize' at hand. Considering future use, I saved it in a json file in the state separated as follows. This hash is updated from time to time, so you'll need to get a new hash every time you run a check in production.



Since this database contained a total of 4997 4-byte first hashes, the possibility that a secure URL hash will be erroneously determined is 4997, which is 2 to the 32nd power, which can be calculated to be about 1 / 200,000. .. The cost of investigating 1 million links from here is about 25 yen, which is overwhelmingly cheaper than the Lookup API, so we decided to adopt the Update API.

To use the Update API, you first need to create a hash from the URL. Since the detailed procedure is described in the guide, you can just convert according to the guide. Each test case is listed, which is a great help when implementing. I didn't know what kind of conversion was only 'http: //\x01\x80.com/' → 'http: //% 01% 80.com/' in the URL normalization test case, but GIGAZINE I ignored the link of such URL because it is not set in. If you are familiar with it, we would appreciate it if you could contact us using this form.



Since multiple hashes are generated from one URL, it is possible to check each database for matching hashes, and if all hashes do not match, the URL can be determined to be a secure URL. If any one of them matches, the complete hash list is obtained by using the search of Update API based on the first hash, and it is verified whether it matches further. We created a malware judgment test as shown in the figure below and proceeded with its implementation.



In this way, we have completed a mechanism that allows you to check the links contained in the site every day. If your old site was abandoned and turned into a malicious site, you could immediately notice and fix the link. Fortunately, at the time of writing the article, there are no URLs that have been judged as malware other than test cases.



In addition, GIGAZINE is currently recruiting people who are interested in such contents. We are waiting for your application.

GIGAZINE Employment Information. – There are things that GIGAZINE can do.
https://gigazine.co.jp/



in Review,   Software,   Security, Posted by log1d_ts