Access is also denied to Google, what is "libwww-perl" which became a spam transmission source?



Apparently, I noticed that spam measures were taken, but apparently the remote hosts (such as IP addresses) that send spam trackbacks are called "libwww-perl / 5.805" in the name part of the user agent, the so-called browser, even though they are separated like"Libwww-perlAs a result, Google etc. seems to refuse access if a part "libwww" of this user agent name is included in the search result page.

When examining it on the net, quite a lot of people seem to think that "libwww - perl is a spam transmission bot", so I will explore the identity. Actually it is not for sending spam. It is an example that a tragedy happens if anything is misused.

Details are as follows.
First of all, if you search for "libwww-perl" as a spammer, you will find like a mountain, including domestic and overseas, but it looks like the following.

Libwww-perl has become an unauthorized access special tool ('Α `) Kieroyo | Linux - P - SOC

Indigo blue | Trackback spam too much ...

Google Analytics count leak and trackback SPAM signature

So what actually is "libwww-perl" actually is like this.

LWP - Wikipedia

Abbreviation for libwww-perl and its module name. Used when implementing HTTP user agents (Web browsers, crawlers, etc.) in Perl.

That's why it's prepared to do a variety of useful things with Perl, in fact.

Gisle Aas / libwww - perl - 5.805 - search.cpan.org

However, if a bot that sends track pack spam is made in Perl, it is easy to create, send a page (GET) and send a trackback (POST) using this "libwww-perl" There are so many cases that "user libwww - perl" or "libwww" is inserted at the head of the user agent name as a result, and access is denied as a spam bot .... There seems to be many cases.

If Google 's first representative example is "Google", and the user agent name contains "libwww", all the search results screen will pop up a "403 Forbidden" error.

In addition, looking at various things, apparently using "libwww - perl" to search arbitrary phrases in Google like dictionary attacks and search and retrieve the address of the page appearing in the search results, save it, It seems that spammers are sending spam trackbacks at once at once creating trackback destination lists from those page lists. Also, it seems that trackbacks as well as spam comments are being done in the same way. This spam transmission method seems to be done by spammers all over the world, it is rough circumstance that it is guessed that Google came to finally restrict access ....

In the case of Firefox, "User Agent Switcher ExtensionIt is possible to experience and experience easily switching user agents.

Also, if the user agent name contains "libwww", the mechanism to deny itself is either httpd.conf or .htaccess
SetEnvIf User-Agent "^ libwww" deny_ua
Order allow, deny
Allow from all
Deny from env = deny_ua

It's okay just to write and say so easy. You can now refuse access like Google. Recently since the IP address that sends spam and the IP address of the bot acquiring the page to send spam are separate, this method of simply regulating by user agent name seems to work for most part . That said, it is obvious that it will become obsolete in a matter of months, but ... ....

However, if this regulation is done, it will regulate even the one that is running "libwww-perl" with the beneficial original use method other than bot. Therefore, if you use "libwww-perl" for a proper use method rather than misuse, it may be better to set another user agent name properly. The method of setting the user agent name with "libwww-perl" is as follows.

LWP :: UserAgent - Get web data

In doing so, if you specify your identity (administrator) like Google Bot, you may not misunderstand it as a spam bot.

in Note,   Software,   Web Service,   Column, Posted by darkhorse