What is the method of collecting e-mail addresses of spammers and their defense measures?


ByFlorian F. (Flowtography)

Looking at the damage of spam on a global scale, major IT companies like Google and Yahoo! use electricity of 30 billion watts per year, which is enough to cover 3 million homes Equivalent to electric power. Without spam, many companies should save energy and money. Even on a personal scale, even if you use a powerful junk e-mail filter, normal e-mails are sometimes classified incorrectly, so eventually you will be given the time to check the junk e-mail folder after all. How does the spammer sending such unsolicited e-mails purchase e-mail addresses? It is studied that the route of acquisition and defense measures are clarified.

How do spammers harvest your e-mail address? · Karan Goel
http://karan.github.io/email-spam/


University of Washington studentsKaran · GoalFrom Internet marketing and blogger experience, he asserts, "There is no disruption in spam mail, and everyone with an e-mail address has experience receiving spam e-mails." Sometimes spammers' spam e-mails, which are subject to monetary losses, are becoming more difficult to distinguish from regular e-mails as their tactics become clever.

That should be that,Paper in 2008According to a research result that spammers need to send 350 million as many junk e-mails before selling 28 items of 100 dollars (about 10,000 yen), raising profitability So that's why I've come up with something I thought of with my very own hand.

In 2012, Mr. Goal encountered the opportunity to participate in research to investigate spam vendor's email address collection means and defense measures. In a year after participating in the research "ExperimentWe succeeded in gathering funds from "I was able to know the mechanism of spammers based on raw data.

The research team posts an e-mail address once to every platform in order to investigate how spammers are acquiring e-mail addresses. After that, we counted the number of e-mail addresses displayed in human readable state for each platform. As the number of sites increases, the registered e-mail addresses are in danger. In blog sites such as Wordpress, the mail address is displayed in the form that the e-mail address can be viewed directly.


· App store review column (Apple, Chrome, Firefox, etc.): 4
· Blog comment: 119
· Blog site (Wordpress, blogger, etc.): 142
· Craigslist discussion board: 6
· Files hosted on Dropbox: 12
· Ecommerce site (such as Amazon): 5
· Facebook profile, wall, page: 5
· Files on the server: 21
· Forum's profile: 234
· Github: 1
· Google Documents: 7
· Google drawing: 2
· Greeting card generation site: 18
· Guest book: 12
· Loyalty program: 10
· Mailing list: 85
・ウェブサイト上の タグ:4
· Other SNS: 5
· PDF on server: 8
· Paste site (Pastebin, etc.): 21
Reddit: 9
·Scribd:Ten
· Slideshare: 5
· Junk Mailing List: 51
· Twitter: 2
· UW Directory: 1
· Usenet: 98
· Video site (YouTube description field / title): 53
· Whois: 5
· Wiki site: 84
· Yahoo Answers: 25


Below is a graphical representation of the results of testing what email address obfuscation techniques are in effect. It means that there is no effect as the graph extends vertically. Among them, "Invalid" has the most e-mail address extracted, and javascript obfuscation also had no effect.


It is like this when enumerating the obfuscated e-mail addresses in a readable state every number. The obfuscation that replaces the display text and the link is the most frequent, and many methods for converting character strings are used like "gigazine [at] irchver [dot] com".


· Split: 6
· ASCII: 17
· Comment: 3
Different Hyperlink: 114
· HTML Unicode: 45
· Imaging: 21
· Invalidation: 260
· Invisibility: 59
· JavaScript: 18
· No obfuscation: 484
ROT-13: 2
· Email (at) irchiver (dot) com: 1
· Email @ irchiver. Com: 23
· Email @ irchiver.com: 18
· Email AT irchiver DOT com: 5
· Email AT irchiver.com: 1
· Email [@] irchiver.com: 1
· Email [at] irchiver [dot] com: 15
· Email [at] irchiver.com: 8
· Email at irchiver dot com: 2
· Email - @ - irchiver -. - com: 13
· Email - @ - irchiver.com: 2
· Email - AT - irchiver -. - com: 1
· Email-AT-irchiver.com:11
· Email-at-irchiver-dot-com: 4
· Email-at-irchiver.com: 2
· Email [@] irchiver.com: 1
· Email [@] irchiver [.] Com: 12
· Email [at] irchiver.com: 2
· Email [at] irchiver [.] Com: 2
· Email [at] irchiver [dot] com: 17
· Email [at] irciver.com: 2
· Iframe: 2


The following is the result of examining the display format of easy-to-diffuse mail address. "Yes" is "here"The one that hides the mail address in the link and displays it. "No" is "[email protected]"Although the mail address is linked as shown in the figure, there is not much difference between the two. The most effective thing was "invaild" which embeds e-mail addresses in images and so on.


Below is the result of verification by clickability. "Yes" is "here"No" is displayed as "[email protected]". The result that it is easy to diffuse is displayed when it is displayed in the clickable state.


When the research team posted about 1000 e-mails to various places, we received a total of 18,000 spam e-mails. Results have come out that the number of spam e-mails reaches the peak at the end of 20 weeks after posting. This is presumed to be the time until the posted page is indexed and ranked by the search engine.


It is like this when a line graph is made of the received spam mail from 20th March 2012 to 11th June 2013. In the holiday season in the second half of October, the mail volume is the maximum, and it seems that spammers are aiming for a time when it is easy to open spam mails.


However, if you look at every week, the amount of mail will increase regularly at the beginning of the week, but because of wanting to take a break on Saturdays and Sundays, the amount of e-mail will go down over the weekend. Because there are not many people checking e-mail frequently on weekdays, researchers say "We are using bandwidth for waste".


In the survey I posted my e-mail address on various platforms, but as a result most of the websites are sending unsolicited e-mails from the following figure. Spam mailing lists are remarkable, and there are many suspicious words such as "free credit score / insurance amount / iPad" will be gifted in the site, but many people continue to register e-mail addresses And that. Amazingly, a lot of unsolicited e-mails are sent through Whois, which displays e-mail addresses as images, obfuscation by images is ineffective, Whois seems to be a mining site for spammers.


Meanwhile, there are platforms that never sent spam as one. In addition to the safe storage of its own servers, the top names of IT companies such as Apple, Amazon, Facebook, Google, Twitter and others are cited as expected.


· App Store (when posting reviews)
· Ecommerce (Amazon, etc., when signing up)
· Facebook (Facebook page in public status · profile)
· Files on the server (text files on your server)
· Google Docs
· Google drawing
· Twitter (Tweeted with email)


Ultimately, the research teamEvery type of email address obfuscation technologyWe actually used the technology to verify the effectiveness of defense. The obfuscation by ROT 13, ASCII gives the best result, but because specialized software is necessary, it is a difficulty that it is not a defensive means that anyone can use.


Here are examples of excellent defense measures that did not send any spam mails.


· Split by tag (insert different tag in HTML address in email address)
· ASCII (encryption)
· HTML Unicode (encrypted)
· Image (Display email address as image)
ROT-13 (convert algorithm)
· Email (at) irchiver (dot) com
· Email @ irchiver. Com
· Email [at] irchiver [dot] com
· Email-@- irchiver-.- com
· Email-AT-irchiver.com
· IFrame (Write e-mail address on one page and embed it on another page with iframe)


The defense measures proposed by the research team are summarized as follows.
· If you have a mail address on a website that can manipulate the source code, it is effective to encode with "ROT - 13" "ASCII" "HTML Unicode". At first glance you can make the email address look like just text.
· If you have a website that too much "talk about", you should avoid it. "Free iPad" will never be given as a replacement for your email address. As mail addresses are sold within spam vendors, unsolicited e-mails will start to reach endlessly once they are registered.
· If you set the mail address described in the source code as much as possible like "karan - AT - goel - DOT - im", user operability will be affected, but it is possible to prevent copy and paste by spammers .
· Pay attention to the form that will fill in your e-mail address by default. According to the survey results, not only will marketing mail arrive at the e-mail address listed on the form, but sometimes it may be sold to advertisers.
· It is necessary to pay attention to forums and bulletin boards that let you write your e-mail address. These are the spam vendors' maximum mail addresses. When it is absolutely necessary to describe it, you should deal with it by using a disposable e-mail address.
· If you are acquiring a domain name, it is effective to use "WhoisGuard" which can substitute another information for Whois and can prevent spam. It also comes with an option to transfer mail to a real email address.

In addition, Gmail independently measures spam mails and phishing mails. Google's findings show that about 90% of non-spam e-mails arriving in GmailDKIMWhenSPFIt is said that it is being transmitted using domain authentication technology to prevent spoofing. With these technologies, Google filters billions of spoofed emails annually.

Google Online Security Blog: Internet-wide efforts to fight email phishing are working
http://googleonlinesecurity.blogspot.jp/2013/12/internet-wide-efforts-to-fight-email.html

in Note,   Software,   Web Service, Posted by darkhorse_log