We’ve all seen CAPTCHA’s — those distorted words that function as a cut-rate Turing test, separating humans from spambots on any number of websites.
About a mweekend I was at MSR Summers..and of the participants was Luis von Ahn — the guy who was responsible for inventing the CAPTCHA idea. He gave a great one-minute talk, in which he traced his personal feelings about being responsible for something that is so useful, yet so annoying.
CAPTCHA, you will not be surprised to hear, is ubiquitous. Luis figured out that the little buggers are filled out about sixty million times per day by someone on the web. So, as the inventer, he first felt a certain amount of pride at having exerted such a palpable influence on modern life. But after a bit of reflection, and multiplying sixty million times by the five seconds it might take to fill in the form, he became depressed at the enormous number of person-hours that were essentially wasted on this task.
Being a clever guy, Luis decided to make lemonade. What we have here is a huge number of people who are recognizing words that a computer can’t make out. Luis realized that there was a separate circumstance in which you would want the computer to recognize the words, even though it wasn’t quite up to the task — optical character recognition, and in particular the problem of digitizing old texts. Apparently, before the advent of the Internet, people would store information by binding together pieces of paper with words printed on them, forming compact volumes known as “books.” In the interest of preserving the products of this outmoded technology, various efforts around the world are attempting to scan in all of those books and store the results digitally. But often the text is not so clear, and the computers don’t do such a great job at translating the images into words.
Thus, reCAPTCHA was born. At this point you should be able to guess what it does: takes scanned images from actual books, with which optical character recognition software are struggling, and uses them as the source material for CAPTCHA’s. The project is up and running, and can be implemented anywhere the ordinary CAPTCHA’s are used. Now, when you get annoyed at having to make out those squiggly words with lines slashed through them, you can take some solace in knowing that you’re making the world a better place. Or at least saving some books from the trash bin of history.