Tuesday, April 22, 2008

Interesting spam patterns and Tibet

I'm working for a telecommunications company specializing in email to SMS text message services. A large part of my responsibility is to help deal with the non-stop flood of spam that constantly arrives at our doorstep, hoping to snare the naive mobile customer into taking viagra three times a day at THE BEST PRICES AVAILABLE!!!!!!11!!!!1!!!111!!!.

Lately, we've been seeing a lot of spam offering low-interest loans from China. I imagine that these are more aimed at the U.S. market than they are here... but I suspect that most sub-prime lenders in China don't really see the distinctions between the Canadian and U.S. desperate-for-money markets. However, that's not what this post is really about.

One of the major techniques used to keep spam from landing in your inbox is a technique known as Bayesian filtering. It is used by many different products, both free and commercial, and it has some effectiveness at eliminating the spam. In an effort to defeat these filters, spammers have taken to including large blocks of text that aren't relevant to their message, but are likely to score low in the Bayesian filter, in the hopes that this will allow their message to make it through the filter. To give you an idea, while penis, p3nis, p3n1s, etc are all terms that score high in the filter, by including a lot of spurious text (darkly dreaming surrealism) they hope that the overall message scores low. Eventually, these messages are caught, and the filters get trained in the new terms so they become high scorers. I'm sure you've all seen these; the messages that have that block of text that make no sense whatsoever.

Lately, the large number of Chinese "borrow money now" spams have included text that are politically relevant, if not relevant to usury. They are including text from wire news reports about the Dalai Lama and the protests in Tibet... and I have, as I have to, been training them into the new spam filters. However, there is a fairly predictable side-effect of doing this; it can lead to legitimate messages about buddhism, the Dalai Lama, Tibetan buddhism, and the current protests in Tibet and around the world as it follows the Olympic torch being caught in spam filters. As an information warfare technique, it is certainly interesting, and it's also interesting that these are being promulgated by parties that don't appear to be state actors... though I have to admit that I'm skeptical that the use of this new technique isn't being motivated by state actors.

It certainly looks to me that the Chinese government is attempting to extend the Great Firewall of China to the rest of the world's email system by exploiting an unintended consequence of the Bayesian techniques of spam filtering.

No comments: