Breaking CAPTCHA

Date February 25, 2008

captcha Out on the news feeds today is a story about spammers signing up GMail accounts by breaking the CAPTCHA test.  This isn’t really a big surprise as this kind of thing happens all the time and GMail / Live Mail / Yahoo Mail are all great targets because of the anonymity of signing up for accounts and the unlikeness of the source being blacklisted (blacklisting GMail for example would DoS a lot of legitimate users).  What is interesting is the method, and the success that the spammers are getting.

So, who knew, CAPTCHA cracking is now Software as a Service (Saas) :)   I hadn’t actually seen this before (a similar method has been used against Live Mail), but it was pretty obvious that it would turn up eventually.  Basically, there appears to be server(s) (in Russia it seems) that accepts a CAPTCHA image from infected machines and tries to solve it (for a fee).  Unfortunately the most interesting part – how the service breaks the CAPTCHA – isn’t analyzed, but to be fair to Websense it’s impossible for them to know as he data they are catching it’s just a black box – images in, code out, but because only the image is passed to the service it’s probably some form of the OCR method (see below).  The success of the service though seems pretty good – 1 in 5 "solutions" (it’s not clear if this is a "correct" solution, or just a response that it matched "something") are returned, whereas if the system can’t solve the CAPTCHA it just doesn’t return anything.

With CAPTCHA’s, there’s really just a few ways of cracking them (which puts aside the idea of bypassing the CAPTCHA completely)

  • Matching ID’s (via CGI parameters or the image src name) to known solutions
  • Using image processing (e.g. OCR) to resolve it to a solution
  • Using image processing (e.g. some type of hash function) to resolve it to pre-computed solutions
  • Making use of a "weaker" form of the CAPTCHA (e.g. utilizing the "accessibility mode" of sound-based CAPTCHA [warning: PPT link - look at very end], which was shown at DefCon last year)

There’s also the "by-hand" approach, where CAPTCHA’s are passed off to humans to solve, but I’m still reluctant to believe that this scales very well even putting aside things like difference in cost in places like 3rd-world countries or the numbers of people accessing Pr0n sites.

In any case, it seems that the traditional "fuzzy letters" CAPTCHA is starting to show wear and tear.  If that’s the case, what can we do about it?

Ok, this needs a ton of research, which I’m sure is going on out there somewhere, and I’ve failed finding any with a casual search (answers on a postcard to the usual address if you can please), but I still stand by what I said when RSnake was discussing this on his blog a while back (click on through if you are interested in this because there’s a ton of good links and even more follow-on discussion)

The first one is to move away from the “copy this” approach to more of a “solve this” approach – humans are much better at solving simple problems (e.g. “My name is Mike – how many letters in my name”, “the color of the sky is…”). There is a cultural problem (people in China will have differrent names, places, etc, so we should be able to tune for cultural/geographic tendencies (and if someone who is trying to sign up for say a US webmail but insists on solving Chineese captchas, that may be a hint).

The other thing that it shows is that getting it right first time, every time, just isn’t a possibility – sometimes the attackers will get through.  This means that we have to have logs for later analysis to either see unusual behavior and take retroactive steps, both in terms of shutting down the ones that got through, and seeing if there’s anything we can do to improve the initial check(s).  It’s just a shame that on the vast majority of systems I take a look at aren’t nearly logging enough information to allow for inspection and behavioral monitoring before or after the fact.

One Response to “Breaking CAPTCHA”

  1. Interesting (disturbing?) news | Mike Andrews said:

    [...] up, and following on from a topic I find interesting, is the news that more evidence is being found that there’s an industry out that solving [...]



Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>