3
DEC2008
Browser CAPTCHA
I'm sure everyone has noticed that my blog posting has dramatically fallen off from the rate I was getting articles out. Unfortunately, I've been spending my blog time fighting the endless war against spam. I've made some progress there and thought I would share some details that others might find useful.
As I've covered previously this blog now requires me to approve all comments. I'm super happy with this decision. I approve posts promptly, so there's pretty much no downside for users and this means you have not seen a single spam message on this site since I made the change. This was literally the perfect solution… on the viewer's side of the fence.
What it didn't fix was the hassle on my side. I don't mind approving messages at all, as long as I have a reasonable pile to go through. However, the spammers really ramped up their efforts against me lately and this blog received 11,134 comment posts in the month of November alone. Six of those were legitimate comments. That exceeds my definition of reasonable.
To fight back, I've added a new plugin to this blog I call Browser CAPTCHA.
If you've read this blog closely enough to know how much I hate CAPTCHA's, that name probably surprises you. It's true that I believe CAPTCHA's are pure evil. If you feel the need to control what makes it past the server and you think, "I'll screw up my interface to make a human prove they are a human," then I think you may have a problem with your brain being missing. I swear I always need three shots just to get past a Google CAPTCHA and that's the "Do No Evil" company. Whatever you do, don't get desperate and hit the hearing impaired CAPTCHA button, because that has to be the only thing worse than a normal CAPTCHA. I'm sure the suicide rates for people with vision impairments must be on the rise in this era of site security.
Browser CAPTCHA doesn't do any of that. Instead, I took a page out of Sun Tzu's The Art of War and got to know my enemy a bit better. Spam bots are not browsers and they do some things differently. If you can detect those differences, you know you are not dealing with a human. Thus my plugin screws up the interface for your browser. If it can pass the test, I trust the post.
What are some differences between browsers and spam bots? Here's a list shared with me from Allan Odgaard:
- Spam bots don't have a Javascript engine. This is the big deal. It seems universally true so it's definitely a key to detecting them. Force them into needing Javascript to pass some test and you've got them.
- Spam bots don't typically pay attention to cookies. This turns out to be a handy performance detail, since you can use mod_rewrite to redirect incoming requests to certain URL's if they are missing a magic cookie before they even reach your application.
- Spam bots don't correctly handle redirects for POST requests. You can use this to add another layer of protection.
The current version of Browser CAPTCHA uses these combined factors to test browsers when they try to post a comment. There are other differences my friends have made me aware of, but I haven't employed them yet.
How's this working out? I've had seven spam posts since I made the change a little over two full days ago. They all came in together and I could tell it was a human investigating the changes I had made. If that's the worst thing I have to worry about now, it's a huge improvement. We will see how things go, but I definitely recommend similar strategies to others fighting in the war…
Comments (8)
-
James Edward Gray II December 3rd, 2008 Reply Link
Allan Odgaard raised another great point to me today: it can be worth it to check against DNS blacklists as well. There is a Ruby script for doing that.
You can get some false positives this way, if an IP switch happens shortly after an address is used for spamming. It's a pretty uncommon occurrence though.
-
I'm actually writing this comment as much to see if I make it through the spam filter as for any legitimate reason. However, I do have a question.
I have been told that when writing a web site I should always have a solution available for clients who lack JavaScript capabilities. This makes sense on a lot of levels. If you want to get your site notices your most important visitor is the GoogleBot, and it doesn't have JavaScript. Screen Readers are another place where JavaScript is spotty at best. Some strange people still block JavaScript on security grounds.
But here we are talking about posting to a blog. An activity which doesn't exactly merit a huge amount of availability considerations. I would be interested to hear how you balanced particularly the JavaScript enabled requirement in your test against these factors.
I'm glad you are back posting again.
-
Does the GoogleBot need to be submitting forms to reach the public portions of your site? I hope not, because it doesn't do that either, never mind the lack of a JavaScript engine.
I seriously hope the current solution isn't hindering screen reading devices. I can see how some anti-spam techniques would, but I'm betting the ones I am using are not. I will take complaints filed against this very seriously.
My site's JavaScript is viewable to any user who cares to read it, so my opinion is that I'm not risking your security. If disagree with that analysis or just don't care to take the time to double-check the security, I completely understand your choice. And that choice is going to cost you the right to post comments here. I don't really feel that's a radical limitation in this The Age of Ajax.
I'm not trying to belittle your concerns. Obviously, I want everyone to be able to read this site. I'm pretty sure they can. I asked Google and he says he can. That's important to me.
I also like allowing comments and I don't want to make you login just to post them. I can do that for most folks with a little browser interrogation and save myself from checking over 11,000 junk messages a month. That's good enough for me.
If that bothers a lot of readers, I could shut off comments altogether. That treats everyone fairly and still allows me to enjoy posting content here. That's just not my first choice.
So yeah, I agree that you need to make your site accessible. But like everything else it's a balancing act. I also needed my blog to be more maintainable. I had to balance those two factors.
-
-
You could throw in a
noscript
tag to warn users who don't have JavaScript enabled. I personally like the approach you've taken.-
I do provide a warning for those who have JavaScript disabled, yes.
-
-
Where can I download the Browser CAPTCHA plugin?
-
I've put a very early and raw version of the plugin on github. This is the code I'm using here, but let's definitely consider it alpha level software at this point.
-
-
Slightly different topic but I wrote some rack contrib middleware named 'Deflect' for whitelisting / blacklisting remote addresses, as well as providing some DDoS / Spam prevention, since humans only browse at a specific pace, anything requesting N requests within N amount of time gets block for N duration, might be worth checking out