Breaking Weak CAPTCHA in 26 Lines of Code
During one of our latest engagements we found a weak CAPTCHA implementation being used in the target Web application. The assessment was being performed on-site, and after identifying this vulnerability we started to talk with the CSO about how easy it would be to break it.
The general consensus of course was “very easy”. The problem was that we were unable to find any good CAPTCHA breaking software that average joe could download and run on his computer; so I spent some minutes creating a simple Python script that returns the CAPTCHA solution for this particular implementation.
Before we dig into the script, lets analyze why this CAPTCHA is weak (might not be obvious for some readers):
- The letters are not rotated
- All letters have the same height
- All letters have the exact same color
- The letters are not deformed in any way
- The background noise color is the same for the whole image
Now, lets see the code that breaks this CAPTCHA:
from PIL import Image img = Image.open('input.gif') img = img.convert("RGBA") pixdata = img.load() # Clean the background noise, if color != black, then set to white. for y in xrange(img.size): for x in xrange(img.size): if pixdata[x, y] != (0, 0, 0, 255): pixdata[x, y] = (255, 255, 255, 255) img.save("input-black.gif", "GIF") # Make the image bigger (needed for OCR) im_orig = Image.open('input-black.gif') big = im_orig.resize((116, 56), Image.NEAREST) ext = ".tif" big.save("input-NEAREST" + ext) # Perform OCR using pytesser library from pytesser import * image = Image.open('input-NEAREST.tif') print image_to_string(image)
This simple script works with ~ 90% of the CAPTCHA images created using this specific implementation. Enjoy!