Obfuscating message content. Gibberish Spam. Obfuscating Email    
Obfuscating message content

Obfuscating message content - Spammer may intentionally misspell commonly-filtered words

Many spam-filtering techniques work by searching for patterns in the headers or bodies of messages. For instance, a user may decide that all e-mail they receive with the word "Viagra" in the subject line is spam, and instruct their mail program to automatically delete all such messages. To defeat such filters, the spammer may intentionally misspell commonly-filtered words or insert other characters, as in the following examples:

Viagra
V1agra
Via'gra
V I A G R A
Vaigra
\ /iagra
Vi@graa


The principle of this method is to leave the word readable to humans (who can easily recognize the intended word for such misspellings), but not likely to be recognized by a literal computer program. This is only somewhat effective, because modern filter patterns have been designed to recognize blacklisted terms in the various iterations of misspelling. Other filters target the actual obfuscation methods; such as the non-standard use of punctuation or numerals into unusual places, for example: within in a word.

(Note: Using most common variations, it is possible to spell "Viagra" in over 1.3 * 1021 ways.)

HTML-based e-mail gives the spammer more tools to obfuscate text. Inserting HTML comments between letters can foil some filters, as can including text made invisible by setting the font color to white on a white background, or shrinking the font size to the smallest fine print.

Another common ploy involves presenting the text as an image, which is either sent along or loaded from a remote server. This can be foiled by not permitting an e-mail-program to load images.

As Bayesian filtering has become popular as a spam-filtering technique, spammers have started using methods to weaken it. To a rough approximation, Bayesian filters rely on word probabilities. If a message contains many words which are only used in spam, and few which are never used in spam, it is likely to be spam. To weaken Bayesian filters, some spammers, alongside the sales pitch, now include lines of irrelevant, random words, in a technique known as Bayesian poisoning. A variant on this tactic may be borrowed from the Usenet abuser known as "Hipcrime" -- to include passages from books taken from Project Gutenberg, or nonsense sentences generated with "dissociated press" algorithms. Randomly generated phrases can create spamoetry (spam poetry) or spam art.

After these nonsense subject lines were recognized as spam, the next trend in spam subjects started: Biblical passages. A program much like Mark V Shaney is fed Bible passages and chops them up into segments. The reasoning is that this text, very different from the writing style of today, will confuse both humans and spam filters.

Another method used to masquerade spam as legitimate messages is the use of autogenerated sender names in the From: field, ranging from realistic ones such as "Jackie F. Bird" to (either by mistake or intentionally) bizarre attention-grabbing names such as "Sloppiest U. Epiglottis" or "Attentively E. Behavioral". Return addresses are also routinely auto-generated, often using unsuspecting domain owners' legitimate domain names, leading some users to blame the innocent domain owners. Blocking lists use ip addresses rather than sender domain names, as these are more accurate. A mail purporting to be from example.com can be seen to be faked by looking for the originating ip address in the mails header, and Sender Policy Framework for example helps by stating that example.com will only send email from xx.xx.xx.xx ip.



Copyright © 2007 SpamAlert.org, All rights reserved.