Killing Comment Spam the Pete Way


This ad will vanish when you reload.

After a year or so of dealing with its ever growing menace, I seem to have comment spam under control, at least for the time being. I’m still getting hit by the bastards, more and more each week, to the level where a comment is being submitted every five minutes, but none of them are actually reaching the site. That, to me, is victory enough. So, for those of you running Movable Type and suffering, here’s how I do it.

First of all, I’m running Movable Type 3.1x, currently the most up to date version. Along with this I’m using the MT Blacklist plugin. What’s interesting, though, is that the blacklist part of this , while important, isn’t the killer feature for me. It’s the Forced Moderation on posts over 21 days that’s doing it.

What seems to happen is the spammers only hit posts that have been indexed by Google already and new ones aren’t on their radar yet. For example, this Google search brings up 2,470 pages on my site with a commenting form. There are some from January 2005 in there, but they tend not to have built up much PageRank yet. Also, we can assume that if the spammers are using 3rd party applications and lists, the data they’re spamming with is going to be a few weeks old. However it works, they ain’t hitting the new posts right now and that’s a good thing.

So, nothing older than three weeks can have a comment posted to it without it going into the moderation pot. I could of course just close comments on posts over three weeks old, but there are a few older posts that have a life of their own and I don’t want to kill them, and since they’re not part of my current blogging life they can cope with waiting a day or two before being updated. The important part - the discussion between my regular readers and myself, is safe and sound. And most importantly the “Recent Comments” list on the main page is accurate, relevant and useful.

Of course, you then end up with hundreds of moderated comments which need to be sorted through and deleted, which is a pain so reducing this would be nice. The MT Blacklist blacklist hasn’t been updated for ages but it still does a fairly good job. I’m not sure when I last reset the stats, but since then it’s blocked 11,963 comments and moderated 5,422, so two thirds of the comments never even made it onto the database. If and when the updates start again this will improve, but for now a third is better than total failure. It’s also debatable as to whether a blacklist works these days - I’ve noticed spammers using different URLs for each comment, for example, which would be impossible to block. (The theory is they see which URL works the best and then use this as a PageRank spawning page for their “clients”).

[Important note: MT Blacklist has a weird bug whereby when you delete a weblog a second weblog is deleted at random. So if you're planning on deleting a weblog don't. Just delete the files on the server (which you'd have to do anyway) and leave it alone for now.]

The next weapon, which sits in front of MT Blacklist is a wee plugin called MT DBSL. To get around IP banning, spammers use open proxy servers which disguise where they’re coming from. Some of these are listed on DBSL.org so the plugin checks each comments against that list. Since legit folks don’t tend to be using open proxies (and don’t tend to know what they hell they are - I’m still a bit fuzzy on the mechanics) blocking them indiscriminately isn’t a major issue. MT DBSL’s success fluctuates, depending on all manner of variables I guess, but yesterday it blocked hundreds of comments, so many I can’t count them, approximately one every 5-10 minutes. Since it just sits there needing no attention whatsoever, this is a very good thing.

And most importantly, no-one has ever contacted me to say their legitimate comment was blocked. Even if someone had been blocked incorrectly it obviously wasn’t that important.

All that remains is housekeeping. This takes place on the standard Movable Type comments listing page where moderated comments are shaded in brown. Some days these can add up to a couple of hundred comments to be sorted through, but rather like how you can spot email spam on sight the bad ones are pretty easy to spot. Not many legitimate commenters encode their names like this: &#111;nl&#105;n&#101; p&#111;k&#101;r for example, or start their posts with an <h1> tag. And unlike deleting comments in the old MT2.x, which could take a good half hour, the current interface means you can get rid of hundreds in five minutes. One thing I’d like to be able to do is filter out approved comments so I can just see the moderated ones and delete them all in one fell swoop, but that’s a minor quibble.

An interesting side effect is that when you get a legitimate comment on an old post you have to actively approve it. This approval process means you spend a few seconds looking at the comment and deciding whether or not to let it through, and of course the lamed-assed comments written in txt-spk by illiterate teenagers don’t stand a chance.

And that’s pretty much it. Comment spam never makes it onto my site and I spend five minutes a day making sure nothing legitimate has been caught in the trap. A few months ago I was seriously thinking about removing all comments from the site. Now I’m thinking about how to develop the community aspects of this site further. I think that’s a result!

16 comments so far

  1. Pete Ashton on January 21st, 2005

    Oh, just for the record, I’m not using TypeKey or any authorisation system, I’m not moderating at all on those posts younger than 21 days and I’m not requiring email addresses from commenters. People can use HTML in their comments as much as they like and no-one will be blocked on those new posts unless they’re in the Blacklist list. There are no barriers to communication in my comments system at all.

    Nice, huh?

  2. groc on January 21st, 2005

    it sez here:

    * CHANGES IN v2.02

    + Fixed critical “delete a weblog, lose a second for free” bug

    as I wade thru the mt-blacklist readme trying to understand how I’m supposed to install the stupid thing.

    think I might go over to blogger or something else at the moment. this ain’t worth the headaches.

  3. Barbara June on January 21st, 2005

    You surely have one of the most interesting sites in net. I believe in your way of doing, so go on this way. It’ surely the right way.

  4. Reinder on January 21st, 2005

    Looks like one comment spam got through right here :)

  5. groc on January 21st, 2005

    Giggle.
    oh the irony.
    oh the hubris.

  6. Pete Ashton on January 21st, 2005

    That’s gotta be a hoax, damn you…

  7. Pete Ashton on January 21st, 2005

    I was going to neuter it but realised that with no-follow I don’t need to anymore. The joke remains.

    And it is a joke - I’ve never seen a comment spam with that pattern to it before, and I know one when I see one. Nice try though.

  8. Reinder on January 22nd, 2005

    I do see a lot of comment spams that look like that (generic compliment plus unrelated URL).
    However, looking at the website linked, it’s, uhm…. odd. It’d be a lot of work to play a prank on one weblog, but I suppose if the prankster wanted to tweak a lot of people, it’d be worth the trouble to make.

  9. Pete Ashton on January 22nd, 2005

    Nah, it’s not one of the ones I get. The basic structure is there but it’s all in the rhythm. It’s like sniffing a fine wine. It might well be a genuine cut-n-paste of a spam, but it wasn’t sent by a robot because it’s all on its own.

    Plus it was on here within hours. I have never, in all the months I’ve been swamped by these bastards, seen one come to a post that quickly.

    I refer you to the third comment here

    Of course, since my defence is pretty much based on them not hitting new posts, now would be the perfect time for them to change that tactic

  10. Reinder on January 24th, 2005

    It turns on that the survival time of a bare MT. 3.14 install on my host isn’t long enough to configure MT-Blacklist. *That* is irony.

  11. Pete Ashton on January 25th, 2005

    Okay, I’ve had two pieces of comment spam on my most recent entry in the last 24 hours. Now I’ve had hundreds of attempts to spam my in that time so two isn’t terrible, but it’s worse than none, and it’s my most recent post. That ain’t supposed to happen. Looks like the battle is shifting again. Cunts.

  12. Pete Ashton on January 25th, 2005

    I’m going to use this comment thread to list my comment spam battles rather than clog up the blog itself with such triffles.

    Today I tried the “rename mt-comments.cgi” trick which I hadn’t done before mainly because it seemed too simple.

    Three hours in and not one spam.

    Wow.

    Details a third of the way down this page

    In the long term I’m sure it’ll be compromised (if you view the source of the comments form it’s in there) but it only took a minute to do, which is quicker than deleting spams held for moderation. I’m guessing that a rename every few days should do the trick.

  13. Pete Ashton on January 25th, 2005

    Spamassasin seems too good to be true, which might explain why I haven’t tried it. If the renaming trick fails (still holding up 5 hours later) I’ll give it a go.

  14. Pete Ashton on January 26th, 2005

    Okay, changing the mt-comments.cgi scripts name works. 12 spam comments in 24 hours. That’s down from a good couple of hundred. Whether it’s still working in a week remains to be seen.

    This is speculation, but I think I’ve eliminated the spammers who just hit the script indiscriminately and am only getting those who are “visiting” the page itself, which means they’re going via google, which means they’re not going to hit the more recent posts (see 3rd paragraph above).

  15. Pete Ashton on January 30th, 2005

    Had an email from someone who was prevented from commenting by DBSL despite coming from a (currently) legit IP, so since the renaming of the comments script is working I”ve removed the MT-DSBL plugin. We’ll see what adverse effect this has.

  16. Pete Ashton on February 2nd, 2005

    First piece of Trackback spam appeared yesterday with another today. I notice on the bloggernet that others are attracting it too. since I don’t use Trackback much if at all I might as well switch it off but it’s a pisser for those who rely on it.