CAPTCHA: Telling Humans and Computers Apart Automatically
A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but
current computer programs cannot. For example, humans can read distorted text as the one shown below, but current computer programs can't:
The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was
coined in 2000 by Luis von Ahn, Manuel Blum,
Nicholas Hopper and John Langford of Carnegie Mellon University.
A free, secure and accessible CAPTCHA implementation is available from the reCAPTCHA project. Easy to install plugins and controls are
available for WordPress, MediaWiki, PHP,
Perl, Python, Java, and many other
environments. reCAPTCHA also comes with an audio test to ensure that blind users can freely navigate your site. reCAPTCHA is our officially recommended
Test Drive a CAPTCHA
reCAPTCHA. Stop spam and help digitize books at the same time!
The words shown come directly from old books that are being digitized.
SQUIGL-PIX. Our newest CAPTCHA!
ESP-PIX. A CAPTCHA script that's close to our hearts. Instead of typing
letters, you authenticate yourself as a human
by recognizing what object is common in a set of images. This was the first example of a CAPTCHA based on image recognition.
Our new site, duolingo.com offers a way for you to learn languages 100% free while helping
to translate the Web.
Applications of CAPTCHAs
CAPTCHAs have several applications for practical security, including (but not limited to):
Preventing Comment Spam in Blogs. Most bloggers are familiar with programs that submit bogus comments,
usually for the purpose of
raising search engine ranks of some website (e.g., "buy penny stocks here"). This is called comment spam. By using a CAPTCHA, only humans can enter
comments on a blog. There is no need to make users sign up before they enter a comment, and no legitimate comments are
Protecting Website Registration. Several companies (Yahoo!, Microsoft, etc.) offer free email services.
Up until a few years ago, most of these services suffered from a specific type of attack: "bots" that would sign up for
thousands of email accounts every minute. The solution to this problem was to use CAPTCHAs to ensure that only humans
obtain free accounts. In general, free services should be protected with a CAPTCHA in order to prevent abuse by
Protecting Email Addresses From Scrapers. Spammers crawl the Web in search of email addresses posted in clear text. CAPTCHAs provide an effective mechanism to hide your email address from Web scrapers. The idea is to require users to
solve a CAPTCHA before showing your email address. A free and secure implementation that uses CAPTCHAs to obfuscate an email address can be found at reCAPTCHA
Online Polls. In November 1999, http://www.slashdot.org released an online poll asking which was
the best graduate school in computer science (a dangerous question to ask over the web!). As is the case with most
online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once.
However, students at Carnegie Mellon found a way to stuff the ballots using programs that voted for CMU thousands of
times. CMU's score started growing rapidly. The next day, students at MIT wrote their own program and the poll became
a contest between voting "bots." MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school
with less than 1,000. Can the result of any online poll be trusted? Not unless the poll ensures that only humans can
Preventing Dictionary Attacks. CAPTCHAs can also be used to prevent dictionary attacks in password
systems. The idea is simple: prevent a computer from being able to iterate through the entire space of passwords by
requiring it to solve a CAPTCHA after a certain number of unsuccessful logins. This is better than the classic approach of
locking an account after a sequence of unsuccessful logins, since doing so allows an attacker to lock accounts at will.
Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding
them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't
guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they
usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly
guarantee that bots won't enter a web site, CAPTCHAs are needed.
Worms and Spam. CAPTCHAs also offer a plausible solution against email worms and spam: "I will
only accept an email if I know there is a human behind the other computer." A few companies are already marketing this idea.
If your website needs protection from abuse, it is recommended that you use a CAPTCHA. There are many CAPTCHA implementations, some better than
others. The following guidelines are strongly recommended for any CAPTCHA code:
Accessibility. CAPTCHAs must be accessible. CAPTCHAs based solely on reading text — or other
visual-perception tasks — prevent visually impaired users from accessing the protected resource. Such CAPTCHAs
may make a site incompatible with Section 508 in the United States. Any implementation of a CAPTCHA should allow blind
users to get around the barrier, for example, by permitting users to opt for an audio or sound CAPTCHA.
Image Security. CAPTCHA images of text should be distorted randomly before being presented to the user. Many
implementations of CAPTCHAs use undistorted text, or text with only minor distortions. These implementations are
vulnerable to simple automated attacks.
Script Security. Building a secure CAPTCHA code is not easy. In addition to making the images unreadable by
computers, the system should ensure that there are no easy ways around it at the script level. Common examples of
insecurities in this respect include: (1) Systems that pass the answer to the CAPTCHA in plain text as part of the web
form. (2) Systems where a solution to the same CAPTCHA can be used multiple times (this makes the CAPTCHA vulnerable to
so-called "replay attacks"). Most CAPTCHA scripts found freely on the Web are vulnerable to these types of attacks.
Security Even After Wide-Spread Adoption. There are various "CAPTCHAs" that would be insecure if a
significant number of sites started using them. An example of such a puzzle is asking text-based questions, such as a
mathematical question ("what is 1+1"). Since a parser could easily be written that would allow bots to bypass this
test, such "CAPTCHAs" rely on the fact that few sites use them, and thus that a bot author has no incentive to
program their bot to solve that challenge. True CAPTCHAs should be secure even after a significant number of websites
Should I Make My Own CAPTCHA? In general, making your own CAPTCHA script (e.g., using PHP, Perl or .Net) is a bad idea, as there are many failure
modes. We recommend that you use a
well-tested implementation such as reCAPTCHA.
The "Pornography Attack" is Not a Concern
It is sometimes rumored that spammers are using pornographic sites to solve CAPTCHAs: the CAPTCHA images are sent to a porn site, and the porn site users are
to solve the CAPTCHA before being able to see a pornographic image. This is not a security concern for CAPTCHAs. While it might be the case that some
spammers use porn
sites to attack CAPTCHAs, the amount of damage this can inflict is tiny (so tiny that we haven't even noticed a dent!). Whereas it is trivial to write a bot that
abuses an unprotected site millions of times a day, redirecting CAPTCHAs to be solved by humans viewing
pornography would only allow spammers to abuse systems a few thousand times per day. The economics of this attack just don't add up: every time a porn site shows
a CAPTCHA before a porn image, they risk losing a customer to another site that doesn't do this.
Advancing Artificial Intelligence
CAPTCHA tests are based on open problems in artificial intelligence (AI): decoding images of distorted text, for instance, is well beyond the capabilities of
Therefore, CAPTCHAs also offer well-defined challenges for the AI community, and induce security
researchers, as well as otherwise malicious programmers, to work on advancing the field of AI. CAPTCHAs are thus a win-win situation: either a CAPTCHA is not
broken and there is a way to differentiate humans from computers, or the CAPTCHA is broken and an AI problem is solved.
Academic Publications and Presentations
Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum.
reCAPTCHA: Human-Based Character Recognition via Web Security Measures.
Jennifer Tam, Jiri Simsa, Sean Hyde, and Luis von Ahn. Breaking Audio CAPTCHAs.
In Advances in Neural Information Processing Systems
Luis von Ahn, Manuel Blum and John Langford. Telling
Humans and Computers Apart Automatically. In Communications of the ACM.
Luis von Ahn, Manuel Blum, Nicholas Hopper, and John Langford. CAPTCHA: Using Hard AI Problems for Security. In
Kumar Chellapilla and Patrice Y. Simard. Using Machine Learning to Break Human
Interaction Proofs (HIPs). In NIPS. (Explains how to break the simple CAPTCHAs for which character segmentation is easy.)
Greg Mori and Jitendra Malik. Recognizing Objects in Adversarial Clutter: Breaking a Visual
CAPTCHA. In CVPR. (Explains how to break a simple CAPTCHA.)
Google Tech Talk about CAPTCHAs and Human Computation.
Selected Popular Press Pieces