Categories
Advertising Net Culture Personal Publishing Technology

What has been killing my server?

Today I was at work when Barbelith went down. MySQL errors everywhere, the community in uproar, IMs and e-mails. And it wasn’t like I didn’t have enough to do. So I explore in more depth. First step, see what’s actually happening on the server – so I launch Terminal, ssh in to the Barbelith Superserver over at Pair, find the directory with my logs in and type in tail -f access-log. Immediately, I see each request coming into the server in roughly real-time, scrolling down the page like I’m looking at The Matrix. Unix is not my strong-point, so thanks to Simon for that little trick. It’s moving too fast for me visually get a grasp on what’s going on, but I start seeing some recurrent patterns after a minute or so – HTTrack, which I do a quick search for and turns out to be a piece of software that you run on your computer to download complete versions of someone’s website. Given that Barbelith contains nearly six hundred thousand posts across twenty five thousand threads (each paginating ever forty or so posts), this is not a small job. And given that the software is dragging down a bunch of pages each and every second, it’s not really a surprise that the MySQL server was having some trouble.

So I banned the user’s IP for a bit by adding a couple of lines to my .htaccess file and waited for the site to start working again. But no luck. Exploring the database through the PHPMyAdmin interface that Cal set up for me, I note that all the activity has resulted in one table in the database getting corrupted. So I dig around online a little longer, and work out how to login to MySQL directly through the Terminal and run a repair table command and hope for the best. It all seems to work. Everything’s back to normal. Cheers all around. I’m very proud of myself.

Except then half an hour later the site is down again. This time it’s so bad that people can’t even connect to my server at all. Every site that I run off the server is completely inaccessible to the outside world. plasticbag.org and Barbelith stop working obviously, but also other little-known ventures like Everything in Moderation and bought-for-fun-after-seeing-a-Penny Arcade strip-and-maybe-taking-the-joke-a-little-to-excess Cockthirsty.com are out of action. I can’t even ssh in to my server any more. I can’t send urgent support e-mails to my hosts, or receive replies to them. I am, to all intents and purposes, dead in the water.

I ring them up – half a world away – to find out what’s going on. They’re initially mystified – MySQL is running so hot it’s a wonder that the rack-mounts aren’t melting. When they try and login, the server basically falls over completely. A forced restart, and I hold my breath a little. When it comes back, they dig into the logs and it becomes immediately obvious to them what’s going on. Hundreds – thousands – of requests every minute for a file called mt-comments.cgi – the part of Movable Type that deals with incoming comments to my weblog. My entire site has been quite directly, and clearly spammed to death.

So I’ve had to make a short-term choice while I explore my options in more depth, between a site with no comments and no site at all – and I’m afraid the answer is no more comments, at least for the time being. I’d been thinking of looking into Akismet, but there’s simply no point. That still means that MySQL is going to be dealing with all this crap-peddling evil purpetrated by money-grubbing parasites, and that means regular meltdowns. I’ve come to wonder whether the problems I’ve had with MySQL errors on Barbelith over the last couple of years have been more to do with comment spam than anything else, and – while I want to make it clear that in no way do I blame Six Apart or Movable Type or anything and while I’m sure there’s a way out of this situation – it has started to feel like having the mt-comments.cgi script sitting on my server is like having a bullseye painted on my chest. In the meantime, any advice people have on how to deal with this kind of activity would be very much appreciated indeed. Would moving to Typekey authentication only help? Should I be looking into throttling on the server? Can anyone help? The e-mail address (I’m afraid) is tom at the name of this site – or you can write your own post and link to this one and I’ll find you via Technorati.

15 replies on “What has been killing my server?”

Hey Tom, here’s a possible option? When I ran MT on my site I renamed the mt-comments.cgi file to dunc-comments.cgi or such like and then updated all references to that file. It seemed to stop my spamming at the time as spammers where relying on the mt-comments.cgi being there.

It might not last forever, but you might get some mileage out of just changing the name of mt-comments.cgi to something random and updating the CommentScript setting in mt.cfg to look for it in the new location.

Many spammers use scripts that assume a lot of defaults. Is it possible that renaming mt-comments.cgi to something else, and updating references in the necessary pages, would throw off some of them?

Do as Mike says, rename the file, and create a new mt-comments.cgi
new mt-comments.cgi just logs every. single. request to a file (or database, although thats not the best option if it caused corruption & a crash last time) and periodically cron through the request log, pushing IP address into a set of .htaccess deny rules.
add a rule to your robots txt to disallow the file, and only direct requests for it will succeed, so you can assume (assumption being the mother of all … godsends) all requests are Badly Behaved Bots.

I quite like Bad Behavio(u)r (link),
which intercepts anything which is sending wonky HTTP and blocks it with a status code error, preventing the comment script from being called in the first place.
I’d second the rename-your-comments script as well. Though honestly, MT is bloody awful at this and SixApart evidently couldn’t care less anymore.

Hey Tom, sorry to hear you’re having these problems. Spammers are certainly a scummy lot of parasites who don’t really care if they kill the host. That’s not an easy problem to solve at the root level.
However, there are definitely a few steps you can take to remove the heat, at least for a while if not forever. The key is, as Mike mentions, to make your site sufficiently different enough from the rest in order to sidestep the spam attacks.
First, go ahead and change the name of the mt-comments.cgi script and make that change reflected in your mt-config.cgi using the CommentScript directive. For example:
CommentScript mt-feedback.cgi
After you do that, rebuild all of your individual archives if you’re using static publishing. Of course, that will not stop all spammers because some actually read the form action before submitting, but it will provide a nice 404 error to all of the dumb spam software users.
Secondly, you can employ any number of front line defenses to block spammers before they ever hit Movable Type or at least cause it any load.
TypeKey is our highest recommendation as it’s a proven spam killer. Of course, you can configure it so that you accept both TypeKey authenticated- and non-authenticated comments, but force moderation on the latter. This allows anyone to immediately publish a comment if they wish and dumps the rest (including whatever spam might have made it through) into unpublished status. Making sure that Movable Type isn’t publishing the spam is the most important step to keeping your load down.
Secondly, in lieu of that, you could employ a number of CAPTCHA-like or obsfucation techniques such as:
* My CommentChallenge plugin
* Mark Carey’s Disguise Comment URL plugin
* Kevin Shay’s Tiny Turing
Lastly, (although I always hate to even espouse this) you could reduce your exposure to spammers by shutting down commenting on older entries that get nothing but spam using something like Conversation Killer plugin or Close Comments.
I’d love to know what happens after these spammers hit your install. Are these comments being junked, moderated or published? If they aren’t being junked, take a look at the junk scoring log at the bottom of the individual comment editing page an let me know why. This is definitely abnormal behavior and I’ve found that in almost every case I’ve investigated, all it took was a few tweaks to get everything running well and unmolested by spammers and CPU load problems.
Feel free to drop me an email if you like.

Comment spam should be a lot easier to deal with than SMTP spam. (I know this — I created SpamAssassin after all 😉 With weblog comments, you control the protocol entirely, whereas with SMTP you’re stuck with an existing protocol and very little “wiggle room”.
On my WordPress weblog — which, admittedly, gets only about 1/4 of the traffic plasticbag.org does — I’ve instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I’ve removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.
Those are the only antispam measures I’m using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That’s it.
The key is to *not* use the same measures as everyone else — if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site — or use human operators sitting down to an IE window.
(Oh, and trackback was broken, abuse-wise, right from the start. Turn that off for sure!)

Hi Tom
I think you should rename your comments script 😉 How about a nice captcha, not the nasty type-out-this-illegible-essay-perfectly-or-else, but a nice one like “click on the little monkey, not the lemur, the monkey”. You click the monkey and he gives you a little smile and a wink then all is good! Every time the page loads there can be a different scene with the monkey, like “Monkey goes to the zoo”, “monkey having a picnic”, “monkey tennis” or “monkey in the mirror” Don’t click the reflection!
Anyway, that’s my idea. xx

Hi Tom,
I had this exact problem on my servers about a year ago. I was hosting around 15 MT blogs and the spammers kept destroying my server by just regularly accessing mt-comments.cgi. Anyway – without making this a movabletype versus wordpress discussion – i ended up moving all the blogs to wordpress and haven’t had any problems with server load and spammers since. I think what helped was moving from cgi/perl(which was being executed separately on every pageview) to php.
Now the problem of comment spam is still there – but it wasn’t running the server out of memory.
hope this helps..

Spammers are unfortunate – I’v had a few battles with them myself. From email to website spam.
For a while on one of my websites I had a ‘reply’ box at the bottom, where anybody could leave a comment with out regestering. Just put their name and message in and hit post. People liked it because it was so quick, but after a google PR update it started getting hit by auto posting bots. I had to take it off.
I remember when you didn’t have to watch out for these things on the internet, many years ago now…

Personally, I’ve found that analysing bot behaviour can help a great deal in cutting out large swathes of spam. Recently I’ve been plagued with hundreds and hundreds of “Nice Site!” and “Well Done!” comments – all of which got past my (very rudimentary) spam-words-checking script.
What I noticed though, was that the bots flooded all the non-standard text-inputs in my form with the same text as they used in the ‘name’ input – so the variables that I use to let people use their Flickr and Blogger profiles would both be set, and both be the same value. All I have to do is check to see if that’s the case, and if it is then I dump the comment in my spam-list.
What Justin says is therefore hugely important – because we can alter the commenting protocol (unlike SMTP), we can recode it to trip the bots up. Perhaps including one or two non-standard, redundant fields in the form, and then running them through a modified (and renamed) mt-comments.cgi might help?
(PS. Beware the trappings of tail -f. That way obsession lies)

If you need any help getting your head around obscure Apache directives, feel free to drop me a line and I may well be able to help.

What do you do when your explore keeps saying can’t find server and you can’t get a connection. I have yahoo IM and it won’t connect.

It is horrible. I have having the same problem.
I’ve combined a number of techniques to reduce the spam commnet, however, I am facing a new problem… the file mt-comments.cgi is being access FAR too often… at a rate that it eats up all server processes.
I am looking for a way to “release” mt-comments.cgi from being called.

Comments are closed.