How to prevent harvesting programs from getting your email address
Preventing Spiders from getting your Email Address
Email spammers are constantly looking for new sources of email addresses. They are basically playing a numbers game. The more spam they can send out, the most money they can generate. This is because typically the response rate from spammed advertising is very low. However, if you send out 25,000,000 emails, and only 1%-2% respond, that is still 25,000 customers. If the clients that the spammer is working for makes any kind of profit, this can result is a great deal of money. And that is why email spamming is everywhere, it's just that profitable.
Now spammers may buy email addresses from other spammers, or buy email lists generated from websites or from mailing lists, they have many ways to find out what they need to know.
One of the most blatant and annoying ways in which they get their email lists is to "scrape" or "harvest" websites. What they create are programs that follow the links people put on their website, looking for anything that looks like a possible email address. They then log this address and continue scrapping.
What we want to do is make it more difficult for them to find our email address on our pages, and if possible feed them inaccurate information.
For the first part, a good way to do this is to use javascript to ensure they don't see a valid email address while on your page. We do this by putting the first part into a variable, and the domain part into another variable, then use javascript to put the 2 together.
While a person clicking on the resulting link would get a perfectly valid email, it doesn't match the critera the spiders look for, and so goes unnoticed.
Here is the code you can use to fool website scrapping spiders:
<script language="JavaScript" type="text/javascript">
<!--
// hide from old browsers
//variables
var user = "myname"; //for example - "joe_bloggs"
var domain = "mydomain.com"; //for example - "hotmail.com"
var subject = "I can put anything I want in here, or leave it blank"; //for example = "website feedback"
//output
document.write('<a href=\"mailto:' + user + '@' + domain + '?subject=' + subject + '\">');
document.write('Click here to send us an email</a>');
// -->
</script>
Here is what this looks like:
As you can see, it works fine for us humans and will help your email address stay out of the spammers database!
Causing the Email Harvesting Software to Choke
Now for the second part, trying to make the programs the spammers use as useless as possible, we create something called a poisoned page.
This webpage is usually just a page filled with perfectly matching, yummy to spiders, worthless, randomly generated email address that no one would ever use.
This is an example of a poisoned page, don't expect anything too exciting, though I do hope any email harbesting spiders going through this site choke on it.
To make it extra fun, I'm using php scripting to have the email addresses change each time. With any luck each time a spider visits this site,
they'll find plenty of new email addresses to put into the spammers database. This helps to lower the usefullness of the emails the spider has harvested, and hopefully
force the spammer to spend extra time cleaning their lists, or to send out worthless email.
In either case, this lowers the profitability of the operation, and if enough people do it, perhaps prevent them from using such programs.
Please note: if you install any kind of spider trap, you MUST be certain to block that page from legitimate spiders such as the search engines spiders google uses. Depending on how your trap works, you could trap the googlebot just as you could trap a spammer. We only want to hurt the spammer, so be sure to incude the following in your robots.txt file (if you don't have one, you really should. They are not difficult to put into place, just put them in the root folder of your website. Here it a link to the robot.txt file I use. Note the first entry, this is the one that prevents friendly spiders from hitting my spider trap):
User-agent: *
Disallow: /users.php
Disallow: /users.php/
You can also specify in the page itself that it should not be spidered using your metatags in the header of the webpage.
Personally, I do both.
|