Recaptcha and Books Digitizing

General Data 13 July 2010 | 0 Comments

Recaptcha (or reCAPTCHA) is the system that uses the millions of Captcha texts already deciphered by human internet users for the noble cause of digitizing books, newspaper archives and radio shows. ReCptcha is the little comfort we are given for all those challenging captcha texts; thanks to the time and efforts spent by millions everyday on trying to interpret those scribbled and fuzzy words, someday we will be able to browse through every ancient book on our kindles.

The re-captcha system was developed by one of the original captcha developers, Luis von Ahn of Carnegie Mellon University who had estimated the time spent by humans on captchas as 500,000 hours a day, and, as Carsten Cumbrowski suggests, decided to reprocess the time wasted for solving captcha tests usefully for a change. Or in his own, no less self criticizing words: “Life is only like 700,000 hours, so it’s almost the equivalent of a life. We thought, is there any way we can use this human effort in a way that’s good for humanity?”

Facebook, Twitter and many other websites have joined the mutual effort by implementing reCaptcha. In 2009, reCaptcha was obtained by Google, for its books and news search archive digitization ventures. ReCaptha’s most successful task by now is the digitization of the New York Times 130 years old archive, which started at the end of 2008 and expected to be done by the end of this year. Then what? “There’s no danger of us running out of words,” said von Ahn to BBC News at the end of 2007.
“There are still about 100 million books to be digitized”.
And probably no less spamming bots.

Leave a Reply